block: loop: support DIO & AIO
There are about 3 advantages to use direct I/O and AIO on
read/write loop's backing file:
1) double cache can be avoided, then memory usage gets
decreased a lot
2) not like user space direct I/O, there isn't cost of
pinning pages
3) avoid context switch for obtaining good throughput
- in buffered file read, random I/O top throughput is often obtained
only if they are submitted concurrently from lots of tasks; but for
sequential I/O, most of times they can be hit from page cache, so
concurrent submissions often introduce unnecessary context switch
and can't improve throughput much. There was such discussion[1]
to use non-blocking I/O to improve the problem for application.
- with direct I/O and AIO, concurrent submissions can be
avoided and random read throughput can't be affected meantime
Follows my fio test result:
1. 16 jobs fio test inside ext4 file system over loop block
1) How to run
- linux kernel: 4.1.0-rc2-next-
20150506 with the patchset
- the loop block is over one image on HDD.
- linux psync, 16 jobs, size 400M, ext4 over loop block
- test result: IOPS from fio output
2) Throughput result:
-------------------------------------------------------------
test cases |randread |read |randwrite |write |
-------------------------------------------------------------
base |240 |8705 |3763 |20914
-------------------------------------------------------------
base+loop aio |242 |9258 |4577 |21451
-------------------------------------------------------------
3) context switch
- context switch decreased by ~16% with loop aio for randread,
and decreased by ~33% for read
4) memory usage
- After these four tests with loop aio: ~10% memory becomes used
- After these four tests without loop aio: more than 55% memory
becomes used
2. single job fio test inside ext4 file system over loop block(for Maxim Patlasov)
1) How to run
- linux kernel: 4.1.0-rc2-next-
20150506 with the patchset
- the loop block is over one image on HDD.
- linux psync, 1 job, size 4000M, ext4 over loop block
- test result: IOPS from fio output
2) Throughput result:
-------------------------------------------------------------
test cases |randread |read |randwrite |write |
-------------------------------------------------------------
base |109 |21180 |4192 |22782
-------------------------------------------------------------
base+loop aio |114 |21018 |5404 |22670
-------------------------------------------------------------
3) context switch
- context switch decreased by ~10% with loop aio for randread,
and decreased by ~50% for read
4) memory usage
- After these four tests with loop aio: ~10% memory becomes used
- After these four tests without loop aio: more than 55% memory
becomes used
Both 'context switch' and 'memory usage' data are got from sar.
[1] https://lwn.net/Articles/612483/
[2] sar graph when running fio over loop without the patchset
http://kernel.ubuntu.com/~ming/block/loop-aio/v3/lo-nonaio.pdf
[3] sar graph when running fio over loop with the patchset
http://kernel.ubuntu.com/~ming/block/loop-aio/v3/lo-aio.pdf
[4] sar graph when running fio over loop without the patchset
http://kernel.ubuntu.com/~ming/block/loop-aio/v3/lo-nonaio-1job.pdf
[5] sar graph when running fio over loop with the patchset
http://kernel.ubuntu.com/~ming/block/loop-aio/v3/lo-aio-1job.pdf
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>