Skip to content

Commit 0764470

Browse files
adam900710kdave
authored andcommitted
btrfs: search for larger extent maps inside btrfs_do_readpage()
[CORNER CASE] If we have the following file extents layout, btrfs_get_extent() can return a smaller hole during read, and cause unnecessary extra tree searches: item 6 key (257 EXTENT_DATA 0) itemoff 15810 itemsize 53 generation 9 type 1 (regular) extent data disk byte 13631488 nr 4096 extent data offset 0 nr 4096 ram 4096 extent compression 0 (none) item 7 key (257 EXTENT_DATA 32768) itemoff 15757 itemsize 53 generation 9 type 1 (regular) extent data disk byte 13635584 nr 4096 extent data offset 0 nr 4096 ram 4096 extent compression 0 (none) In above case, range [0, 4K) and [32K, 36K) are regular extents, and there is a hole in range [4K, 32K), and the fs has "no-holes" feature, meaning the hole will not have a file extent item. [INEFFICIENCY] Assume the system has 4K page size, and we're doing readahead for range [4K, 32K), no large folio yet. btrfs_readahead() for range [4K, 32K) |- btrfs_do_readpage() for folio 4K | |- get_extent_map() for range [4K, 8K) | |- btrfs_get_extent() for range [4K, 8K) | We hit item 6, then for the next item 7. | At this stage we know range [4K, 32K) is a hole. | But our search range is only [4K, 8K), not reaching 32K, thus | we go into not_found: tag, returning a hole em for [4K, 8K). | |- btrfs_do_readpage() for folio 8K | |- get_extent_map() for range [8K, 12K) | |- btrfs_get_extent() for range [8K, 12K) | We hit the same item 6, and then item 7. | But still we goto not_found tag, inserting a new hole em, | which will be merged with previous one. | | [ Repeat the same btrfs_get_extent() calls until the end ] So we're calling btrfs_get_extent() again and again, just for a different part of the same hole range [4K, 32K). [ENHANCEMENT] Make btrfs_do_readpage() to search for a larger extent map if readahead is involved. For btrfs_readahead() we have bio_ctrl::ractl set, and lock extents for the whole readahead range. If we find bio_ctrl::ractl is set, we can use that end range as extent map search end, this allows btrfs_get_extent() to return a much larger hole, thus reduce the need to call btrfs_get_extent() again and again. btrfs_readahead() for range [4K, 32K) |- btrfs_do_readpage() for folio 4K | |- get_extent_map() for range [4K, 32K) | |- btrfs_get_extent() for range [4K, 32K) | We hit item 6, then for the next item 7. | At this stage we know range [4K, 32K) is a hole. | So the hole em for range [4K, 32K) is returned. | |- btrfs_do_readpage() for folio 8K | |- get_extent_map() for range [8K, 32K) | The cached hole em range [4K, 32K) covers the range, | and reuse that em. | | [ Repeat the same btrfs_get_extent() calls until the end ] Now we only call btrfs_get_extent() once for the whole range [4K, 32K), other than the old 8 times. Such change will reduce the overhead of reading large holes a little. For current experimental build (with larger folios) on aarch64, there will be a tiny but consistent ~1% improvement reading a large hole file: Reading a 1GiB sparse file (all hole) using xfs_io, with 64K block size, the result is the time needed to read the whole file, reported from xfs_io. 32 runs, experimental build (with large folios). 64K page size, 4K fs block size. - Avg before: 0.20823 s - Avg after: 0.20635 s - Diff: -0.9% Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
1 parent 57e3220 commit 0764470

File tree

1 file changed

+14
-1
lines changed

1 file changed

+14
-1
lines changed

fs/btrfs/extent_io.c

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -998,11 +998,17 @@ static int btrfs_do_readpage(struct folio *folio, struct extent_map **em_cached,
998998
u64 start = folio_pos(folio);
999999
const u64 end = start + folio_size(folio) - 1;
10001000
u64 extent_offset;
1001+
u64 locked_end;
10011002
u64 last_byte = i_size_read(inode);
10021003
struct extent_map *em;
10031004
int ret = 0;
10041005
const size_t blocksize = fs_info->sectorsize;
10051006

1007+
if (bio_ctrl->ractl)
1008+
locked_end = readahead_pos(bio_ctrl->ractl) + readahead_length(bio_ctrl->ractl) - 1;
1009+
else
1010+
locked_end = end;
1011+
10061012
ret = set_folio_extent_mapped(folio);
10071013
if (ret < 0) {
10081014
folio_unlock(folio);
@@ -1036,7 +1042,14 @@ static int btrfs_do_readpage(struct folio *folio, struct extent_map **em_cached,
10361042
end_folio_read(folio, true, cur, blocksize);
10371043
continue;
10381044
}
1039-
em = get_extent_map(BTRFS_I(inode), folio, cur, end - cur + 1, em_cached);
1045+
/*
1046+
* Search extent map for the whole locked range.
1047+
* This will allow btrfs_get_extent() to return a larger hole
1048+
* when possible.
1049+
* This can reduce duplicated btrfs_get_extent() calls for large
1050+
* holes.
1051+
*/
1052+
em = get_extent_map(BTRFS_I(inode), folio, cur, locked_end - cur + 1, em_cached);
10401053
if (IS_ERR(em)) {
10411054
end_folio_read(folio, false, cur, end + 1 - cur);
10421055
return PTR_ERR(em);

0 commit comments

Comments
 (0)