Home > mailing lists

Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian abit) - Mailing list pgsql-hackers

From	Yura Sokolov
Subject	Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian abit)
Date	March 10, 2018 05:11:27
Msg-id	1b83e27b-5704-4666-f0ff-ccd1274f6d17@gmail.com Whole thread Raw
In response to	Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian a bit) (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses	Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian abit) Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian a bit)
List	pgsql-hackers

Tree view

08.03.2018 03:42, Tomas Vondra пишет:
> On 03/06/2018 06:23 AM, Yura Sokolov wrote:
>> 05.03.2018 18:00, Tom Lane пишет:
>>> Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>>>> Snapshots are static (we don't really add new XIDs into existing ones,
>>>> right?), so why don't we simply sort the XIDs and then use bsearch to
>>>> lookup values? That should fix the linear search, without need for any
>>>> local hash table.
>>>
>>> +1 for looking into that, since it would avoid adding any complication
>>> to snapshot copying.  In principle, with enough XIDs in the snap, an
>>> O(1) hash probe would be quicker than an O(log N) bsearch ... but I'm
>>> dubious that we are often in the range where that would matter.
>>> We do need to worry about the cost of snapshot copying, too.
>>
>> Sorting - is the first thing I've tried. Unfortunately, sorting
>> itself eats too much cpu. Filling hash table is much faster.
>>
>
> I've been interested in the sort-based approach, so I've spent a bit of
> time hacking on it (patch attached). It's much less invasive compared to
> the hash-table, but Yura is right the hashtable gives better results.
>
> I've tried to measure the benefits using the same script I shared on
> Tuesday, but I kept getting really strange numbers. In fact, I've been
> unable to even reproduce the results I shared back then. And after a bit
> of head-scratching I think the script is useless - it can't possibly
> generate snapshots with many XIDs because all the update threads sleep
> for exactly the same time. And first one to sleep is first one to wake
> up and commit, so most of the other XIDs are above xmax (and thus not
> included in the snapshot). I have no idea why I got the numbers :-/
>
> But with this insight, I quickly modified the script to also consume
> XIDs by another thread (which simply calls txid_current). With that I'm
> getting snapshots with as many XIDs as there are UPDATE clients (this
> time I actually checked that using gdb).
>
> And for a 60-second run the tps results look like this (see the attached
> chart, showing the same data):
>
>     writers     master      hash       sort
>    -----------------------------------------
>     16           1068       1057       1070
>     32           1005        995       1033
>     64           1064       1042       1117
>     128          1058       1021       1051
>     256           977       1004        928
>     512           773        935        808
>     768           576        815        670
>     1000          413        752        573
>
> The sort certainly does improve performance compared to master, but it's
> only about half of the hashtable improvement.
>
> I don't know how much this simple workload resembles the YCSB benchmark,
> but I seem to recall it's touching individual rows. In which case it's
> likely worse due to the pg_qsort being more expensive than building the
> hash table.
>
> So I agree with Yura the sort is not a viable alternative to the hash
> table, in this case.
>
>> Can I rely on snapshots being static? May be then there is no need
>> in separate raw representation and hash table. I may try to fill hash
>> table directly in GetSnapshotData instead of lazy filling. Though it
>> will increase time under lock, so it is debatable and should be
>> carefully measured.
>>
>
> Yes, I think you can rely on snapshots not being modified later. That's
> pretty much the definition of a snapshot.
>
> You may do that in GetSnapshotData, but you certainly can't do that in
> the for loop there. Not only we don't want to add more work there, but
> you don't know the number of XIDs in that loop. And we don't want to
> build hash tables for small number of XIDs.
>
> One reason against building the hash table in GetSnapshotData is that
> we'd build it even when the snapshot is never queried. Or when it is
> queried, but we only need to check xmin/xmax.

Thank you for analyze, Tomas.

Stephen is right about bug in snapmgr.c
Attached version fixes bug, and also simplifies XidInXip a bit.

With regards,
Sokolov Yura.

Attachment

pgsql-hackers by date:

From: Tom Lane
Date: 10 March 2018, 01:37:49
Subject: Bogus use of canonicalize_qual

From: "David G. Johnston"
Date: 10 March 2018, 05:58:10
Subject: Re: csv format for psql

Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian abit) - Mailing list pgsql-hackers

Attachment

Previous

Next