Commit ebb7fb1557b1 limited the length of ioend chains to 4096 entries
to improve worst-case latency. Unfortunately, this had the effect of
limiting the performance of:
The problem ends up being lock contention on the i_pages spinlock as we
clear the writeback bit on each folio (and propagate that up through
the tree). By using larger folios, we decrease the number of folios
to be processed by a factor of 256 for this benchmark, eliminating the
lock contention.
Creating large folios in the buffered write path is also the right
thing to do. It's a project that has been on the back burner for years,
it just hasn't been important enough to do before now.
-----BEGIN PGP SIGNATURE-----