Visibillity testing - some numbers on current performance. - Mailing list pgsql-hackers
From | Jesper Krogh |
---|---|
Subject | Visibillity testing - some numbers on current performance. |
Date | |
Msg-id | 4D9B67B2.6020007@krogh.cc Whole thread Raw |
Responses |
Re: Visibillity testing - some numbers on current performance.
|
List | pgsql-hackers |
Hi. I initially set out to put some numbers on "why" the visibillity map was important for "select count(*)", primarily to give some feedback to Simon Riggs stating: "Your tests and discussion remind me that I haven't yet seen any tests that show that index-only scans would be useful for performance." I created at small testscript that (on my lousy desktop) created a bunch of different tables, all with 2.000.000 rows and different width. The tests are created so that the tuple-with of around 1500bytes ends around the size of memory of my system, so we eventually have some number for when we're going to "disk". My desktop is only a single SATA 7200 rpm drive, 3GB of memory and no battery backed write cache (who would ever run a REAL database on such a system anyway). (script and output-data, graphs in links at the end of the email). The on this system, visibillity testing take from: 0.2s to 0.5s in total for 2.000.000 tuples dependent on tuple-size (header+14 bytes to header + 504 bytes), this is all in situations where we can assume that hit-bits are set and all thing have kicked in and the complete dataset is memory cached. This made me conclude that, this is just awesome, if I could get somewhere near these numbers in my production system then I would have been haunting perfomance in totally different areas. One the first select after load, still in situations where we're fully memory recident the range is instead: 0.6s to 35s in total for 2.000.000 tuples (First run - second run for +10 bytes and +500 bytes respectively). This degradation can mostly be attributed to concurrent writeout of hit-bits on a single SATA-drive (one-spindle), who would sanely run a db on such a system anyway, this number is probably the least robust one in the test, but the huge range should lead to some concern anyway. The last range is the range where we're hitting disk, and that is fairly uninteresting as is can more or less be seen as a speed of the underlying disk-system, where the one in this one is in the "low range". But is seen from a sequential throughput perspective which this one tests is still does about 80MB/s and an "expensive one" will not buy an order of magnitude in this area. The range is in this case: 1.5s to 42s in total for 2.000.000 tuples. the first one for a +10 bytes tuple, the last one for a +1500 bytes. Of the really non-interesting information is that when I add +2500 bytes to the tuple it goes down to 2.1s, which is due to toast kicking in and the fact that the data I load are highly compressable so it ends up filling next to nothing. Conclusion: Visibillity testing of 2.000.000 tuples takes between 0.2s and 42s, where your system fits into that range hugely depends on your tuple-size, the amount of bloat on your tables, the amount of memory for caching compared to the total database size and if you have sufficient activity on your system for keeping your "active" dataset in memory in favor of background activities. If your active dataset approaches cache-size of the system, you're very likely to hit in the last part of that range. So Simon Riggs question is really good, since the answer would be "it depends". It seems plausible that it is really hard to beat the 0.2s-0.5s for 2m tuples we currently meet for any kind of memory resident dataset. The approach currently being pursues, splitting of the PD_ALL_VISIBLE bit and using that for visibillity testing, would improve the situation enourmously making all the diskbound cases to be in the order of primary-key-index-size+vm-map-size/disk-throughput instead of main-table-size/disk-throughput Which for "slim" tables wouldn't be that much, but for fat tables it can/would be substantial. But it would be crusial to have the bit set at all, and the likelihood would fall with the amount of churn in the table. The worst-case situation would be where the bit is not set anyway and there the speed would be "a primary index" worse than currently and if the best-case would be better at all. But I'm fairly sure that the "average case" would be quite a lot lower than it is today, just by the likelyhood of indexes and vm-maps being in-memory . (at least for databases hitting disk occationally). Getting below 0.2s for 2.000.000 tuples would somewhat be nice but gettting the worst-case numbers an order of magnitue down would be an enourmous benefit to "large" or infrequently used databases. My conclusion is somewhere along the line of: Gettting the visibillity map crash-safe and updated is not the primary goal, but getting the "visibillity testing" separated from the varying size of tuples seems to be the key point here and doing it by moving the PD_ALL_VISIBLE bit out is definately one way of doing it. Another approach could be to "way more aggressively" push to TOAST, this would effectively push the worst-case behaviours down. (I'll try to do some sane benchmarking around that). A third approach could be to do a "slim table", with only the relevant bits from the page/tuple-header sufficient to perform visibillity testing giving a "fixed" cost of visibillity testing. Other approaches ... I'm not that much into PG-internals. spreadsheet with numbers and a couple of graphs: http://shrek.krogh.cc/~jesper/visibillitytesting.pdf Testscript: http://shrek.krogh.cc/~jesper/visibillity-testing.pl Tests done on git-head today. .. comments and suggestions for other related tests are welcome. -- Jesper
pgsql-hackers by date: