Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian abit) - Mailing list pgsql-hackers
From | Yura Sokolov |
---|---|
Subject | Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian abit) |
Date | |
Msg-id | 1b83e27b-5704-4666-f0ff-ccd1274f6d17@gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian a bit) (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Responses |
Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian abit)
Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian a bit) |
List | pgsql-hackers |
08.03.2018 03:42, Tomas Vondra пишет: > On 03/06/2018 06:23 AM, Yura Sokolov wrote: >> 05.03.2018 18:00, Tom Lane пишет: >>> Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >>>> Snapshots are static (we don't really add new XIDs into existing ones, >>>> right?), so why don't we simply sort the XIDs and then use bsearch to >>>> lookup values? That should fix the linear search, without need for any >>>> local hash table. >>> >>> +1 for looking into that, since it would avoid adding any complication >>> to snapshot copying. In principle, with enough XIDs in the snap, an >>> O(1) hash probe would be quicker than an O(log N) bsearch ... but I'm >>> dubious that we are often in the range where that would matter. >>> We do need to worry about the cost of snapshot copying, too. >> >> Sorting - is the first thing I've tried. Unfortunately, sorting >> itself eats too much cpu. Filling hash table is much faster. >> > > I've been interested in the sort-based approach, so I've spent a bit of > time hacking on it (patch attached). It's much less invasive compared to > the hash-table, but Yura is right the hashtable gives better results. > > I've tried to measure the benefits using the same script I shared on > Tuesday, but I kept getting really strange numbers. In fact, I've been > unable to even reproduce the results I shared back then. And after a bit > of head-scratching I think the script is useless - it can't possibly > generate snapshots with many XIDs because all the update threads sleep > for exactly the same time. And first one to sleep is first one to wake > up and commit, so most of the other XIDs are above xmax (and thus not > included in the snapshot). I have no idea why I got the numbers :-/ > > But with this insight, I quickly modified the script to also consume > XIDs by another thread (which simply calls txid_current). With that I'm > getting snapshots with as many XIDs as there are UPDATE clients (this > time I actually checked that using gdb). > > And for a 60-second run the tps results look like this (see the attached > chart, showing the same data): > > writers master hash sort > ----------------------------------------- > 16 1068 1057 1070 > 32 1005 995 1033 > 64 1064 1042 1117 > 128 1058 1021 1051 > 256 977 1004 928 > 512 773 935 808 > 768 576 815 670 > 1000 413 752 573 > > The sort certainly does improve performance compared to master, but it's > only about half of the hashtable improvement. > > I don't know how much this simple workload resembles the YCSB benchmark, > but I seem to recall it's touching individual rows. In which case it's > likely worse due to the pg_qsort being more expensive than building the > hash table. > > So I agree with Yura the sort is not a viable alternative to the hash > table, in this case. > >> Can I rely on snapshots being static? May be then there is no need >> in separate raw representation and hash table. I may try to fill hash >> table directly in GetSnapshotData instead of lazy filling. Though it >> will increase time under lock, so it is debatable and should be >> carefully measured. >> > > Yes, I think you can rely on snapshots not being modified later. That's > pretty much the definition of a snapshot. > > You may do that in GetSnapshotData, but you certainly can't do that in > the for loop there. Not only we don't want to add more work there, but > you don't know the number of XIDs in that loop. And we don't want to > build hash tables for small number of XIDs. > > One reason against building the hash table in GetSnapshotData is that > we'd build it even when the snapshot is never queried. Or when it is > queried, but we only need to check xmin/xmax. Thank you for analyze, Tomas. Stephen is right about bug in snapmgr.c Attached version fixes bug, and also simplifies XidInXip a bit. With regards, Sokolov Yura.
Attachment
pgsql-hackers by date: