Commit 0764470
btrfs: search for larger extent maps inside btrfs_do_readpage()
[CORNER CASE]
If we have the following file extents layout, btrfs_get_extent() can
return a smaller hole during read, and cause unnecessary extra tree
searches:
item 6 key (257 EXTENT_DATA 0) itemoff 15810 itemsize 53
generation 9 type 1 (regular)
extent data disk byte 13631488 nr 4096
extent data offset 0 nr 4096 ram 4096
extent compression 0 (none)
item 7 key (257 EXTENT_DATA 32768) itemoff 15757 itemsize 53
generation 9 type 1 (regular)
extent data disk byte 13635584 nr 4096
extent data offset 0 nr 4096 ram 4096
extent compression 0 (none)
In above case, range [0, 4K) and [32K, 36K) are regular extents, and
there is a hole in range [4K, 32K), and the fs has "no-holes" feature,
meaning the hole will not have a file extent item.
[INEFFICIENCY]
Assume the system has 4K page size, and we're doing readahead for range
[4K, 32K), no large folio yet.
btrfs_readahead() for range [4K, 32K)
|- btrfs_do_readpage() for folio 4K
| |- get_extent_map() for range [4K, 8K)
| |- btrfs_get_extent() for range [4K, 8K)
| We hit item 6, then for the next item 7.
| At this stage we know range [4K, 32K) is a hole.
| But our search range is only [4K, 8K), not reaching 32K, thus
| we go into not_found: tag, returning a hole em for [4K, 8K).
|
|- btrfs_do_readpage() for folio 8K
| |- get_extent_map() for range [8K, 12K)
| |- btrfs_get_extent() for range [8K, 12K)
| We hit the same item 6, and then item 7.
| But still we goto not_found tag, inserting a new hole em,
| which will be merged with previous one.
|
| [ Repeat the same btrfs_get_extent() calls until the end ]
So we're calling btrfs_get_extent() again and again, just for a
different part of the same hole range [4K, 32K).
[ENHANCEMENT]
Make btrfs_do_readpage() to search for a larger extent map if readahead
is involved.
For btrfs_readahead() we have bio_ctrl::ractl set, and lock extents for
the whole readahead range.
If we find bio_ctrl::ractl is set, we can use that end range as extent
map search end, this allows btrfs_get_extent() to return a much larger
hole, thus reduce the need to call btrfs_get_extent() again and again.
btrfs_readahead() for range [4K, 32K)
|- btrfs_do_readpage() for folio 4K
| |- get_extent_map() for range [4K, 32K)
| |- btrfs_get_extent() for range [4K, 32K)
| We hit item 6, then for the next item 7.
| At this stage we know range [4K, 32K) is a hole.
| So the hole em for range [4K, 32K) is returned.
|
|- btrfs_do_readpage() for folio 8K
| |- get_extent_map() for range [8K, 32K)
| The cached hole em range [4K, 32K) covers the range,
| and reuse that em.
|
| [ Repeat the same btrfs_get_extent() calls until the end ]
Now we only call btrfs_get_extent() once for the whole range [4K, 32K),
other than the old 8 times.
Such change will reduce the overhead of reading large holes a little.
For current experimental build (with larger folios) on aarch64, there
will be a tiny but consistent ~1% improvement reading a large hole file:
Reading a 1GiB sparse file (all hole) using xfs_io, with 64K block
size, the result is the time needed to read the whole file, reported
from xfs_io.
32 runs, experimental build (with large folios).
64K page size, 4K fs block size.
- Avg before: 0.20823 s
- Avg after: 0.20635 s
- Diff: -0.9%
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>1 parent 57e3220 commit 0764470
1 file changed
+14
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
998 | 998 | | |
999 | 999 | | |
1000 | 1000 | | |
| 1001 | + | |
1001 | 1002 | | |
1002 | 1003 | | |
1003 | 1004 | | |
1004 | 1005 | | |
1005 | 1006 | | |
| 1007 | + | |
| 1008 | + | |
| 1009 | + | |
| 1010 | + | |
| 1011 | + | |
1006 | 1012 | | |
1007 | 1013 | | |
1008 | 1014 | | |
| |||
1036 | 1042 | | |
1037 | 1043 | | |
1038 | 1044 | | |
1039 | | - | |
| 1045 | + | |
| 1046 | + | |
| 1047 | + | |
| 1048 | + | |
| 1049 | + | |
| 1050 | + | |
| 1051 | + | |
| 1052 | + | |
1040 | 1053 | | |
1041 | 1054 | | |
1042 | 1055 | | |
| |||
0 commit comments