Thread: Re: doc: explain pgstatindex fragmentation
On Mon, 2025-01-27 at 10:13 +0100, Benoit Lobréau wrote: > On 1/25/25 7:07 PM, Laurenz Albe wrote: > > Looks good to me. I have one question left: the explanation for the performance > > penalty of a high leaf fragmentation sounds like it would only be relevant for > > disks where sequential reads are faster. If that is correct, perhaps it would be > > worth mentioning. > > Frederic noticed a performance hit even for on his laptop with a SSD. > > On Fri, 2025-01-24 at 15:41 +0100, Frédéric Yhuel wrote: > > I've noticed that maximum leaf_fragmentation can have a huge impact on > > a range index-only scan, when reading all blocs from disks, even on my > > laptop machine with SSD, but I don't know if this is the right place > > to document this? > > He reported to our team, that he did a test with two indexes on the same > data. They had the same density but one had no fragmentation while the > other had 100%. He got an execution time of ~90ms (0 frag) vs ~340ms > 100% frag). > > I get similar result with my laptor (except my disk is significantly > worse: ~152ms vs ~833ms). Thanks for checking. I'll set the patch "ready for committer". I personally would still like to know how fragmentation slows down performance. Yours, Laurenz Albe
On Mon, 27 Jan 2025 at 11:21, Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > He reported to our team, that he did a test with two indexes on the same > > data. They had the same density but one had no fragmentation while the > > other had 100%. He got an execution time of ~90ms (0 frag) vs ~340ms > > 100% frag). > > > > I get similar result with my laptor (except my disk is significantly > > worse: ~152ms vs ~833ms). > > Thanks for checking. > I'll set the patch "ready for committer". > I personally would still like to know how fragmentation slows down performance. Probable reason is that scanning an unfragmented index results in sequential I/O patterns that the kernel read-ahead mechanism detects and does prefetching for us. Fragmented index looks like random I/O and gets to wait for the full disk latency for every index page. This is one of the tricky parts to fix for AIO, as directIO will also bypass this mechanism. PostgreSQL would need to start issuing those prefetches itself to not have a regression there. In a theoretical world, where we would be able to drive prefetches from an inner B-tree page, the difference between fragmented and unfragmented indexes would be much less. -- Ants Aasma
On 1/27/25 20:56, Ants Aasma wrote: >> I'll set the patch "ready for committer". Thanks! >> I personally would still like to know how fragmentation slows down performance. > Probable reason is that scanning an unfragmented index results in > sequential I/O patterns that the kernel read-ahead mechanism detects > and does prefetching for us. Fragmented index looks like random I/O > and gets to wait for the full disk latency for every index page. > > This is one of the tricky parts to fix for AIO, as directIO will also > bypass this mechanism. PostgreSQL would need to start issuing those > prefetches itself to not have a regression there. > > In a theoretical world, where we would be able to drive prefetches > from an inner B-tree page, the difference between fragmented and > unfragmented indexes would be much less. Thank you Ants, I think you are right. I run my test again with kernel readahead disabled (with the blockdev --setra 0 /dev/sdX command), and I obtained the following numbers with the unfragmented index: Buffers: shared hit=1 read=4953 I/O Timings: shared read=252.167 and these with the fragmented one: Buffers: shared hit=1 read=4904 I/O Timings: shared read=569.341 With kernel readahead enabled, it was: Buffers: shared hit=1 read=4953 I/O Timings: shared read=18.087 and: Buffers: shared hit=1 read=4904 I/O Timings: shared read=336.984 I run the tests many times and there is very little variation. We see that random access benefits a little from kernel readahead, but I suspect that's because the blocks of the index aren't completely scattered across the disk. More interestingly, when kernel readahead is disabled, we see that scanning the fragmented index still takes twice as long as scanning of unfragmented one. AFAIK, this is normal for a SSD. Isn't it? (I always thought that random reads and sequential reads would be almost equally fast on an SSD, but this does not seem to be the case).