Re: Making all nbtree entries unique by having heap TIDs participatein comparisons - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Making all nbtree entries unique by having heap TIDs participatein comparisons |
Date | |
Msg-id | CAH2-Wzk9RBfY4xL+B-QJQrS2qZsWKzrQmRSm+4_NgxLrotkfBg@mail.gmail.com Whole thread Raw |
In response to | Re: Making all nbtree entries unique by having heap TIDs participatein comparisons (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: Making all nbtree entries unique by having heap TIDs participatein comparisons
Re: Making all nbtree entries unique by having heap TIDs participatein comparisons |
List | pgsql-hackers |
On Wed, Oct 3, 2018 at 4:39 PM Peter Geoghegan <pg@bowt.ie> wrote: > I did find a pretty clear regression, though only with writes to > unique indexes. Attached is v6, which fixes the issue. More on that > below. I've been benchmarking my patch using oltpbench's TPC-C benchmark these past few weeks, which has been very frustrating -- the picture is very mixed. I'm testing a patch that has evolved from v6, but isn't too different. In one way, the patch does exactly what it's supposed to do when these benchmarks are run: it leaves indexes *significantly* smaller than the master branch will on the same (rate-limited) workload, without affecting the size of tables in any noticeable way. The numbers that I got from my much earlier synthetic single client benchmark mostly hold up. For example, the stock table's primary key is about 35% smaller, and the order line index is only about 20% smaller relative to master, which isn't quite as good as in the synthetic case, but I'll take it (this is all because of the v6-0003-Add-split-at-new-tuple-page-split-optimization.patch stuff). However, despite significant effort, and despite the fact that the index shrinking is reliable, I cannot yet consistently show an increase in either transaction throughput, or transaction latency. I can show a nice improvement in latency on a slightly-rate-limited TPC-C workload when backend_flush_after=0 (something like a 40% reduction on average), but that doesn't hold up when oltpbench isn't rate-limited and/or has backend_flush_after set. Usually, there is a 1% - 2% regression, despite the big improvements in index size, and despite the big reduction in the amount of buffers that backends must write out themselves. The obvious explanation is that throughput is decreased due to our doing extra work (truncation) while under an exclusive buffer lock. However, I've worked hard on that, and, as I said, I can sometimes observe a nice improvement in latency. This makes me doubt the obvious explanation. My working theory is that this has something to do with shared_buffers eviction. Maybe we're making worse decisions about which buffer to evict, or maybe the scalability of eviction is hurt. Perhaps both. You can download results from a recent benchmark to get some sense of this. It includes latency and throughput graphs, plus details statistics collector stats: https://drive.google.com/file/d/1oIjJ3YpSPiyRV_KF6cAfAi4gSm7JdPK1/view?usp=sharing I would welcome any theories as to what could be the problem here. I'm think that this is fixable, since the picture for the patch is very positive, provided you only focus on bgwriter/checkpoint activity and on-disk sizes. It seems likely that there is a very specific gap in my understanding of how the patch affects buffer cleaning. -- Peter Geoghegan
pgsql-hackers by date: