Home > mailing lists

Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian a bit) - Mailing list pgsql-hackers

From	Alexander Korotkov
Subject	Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian a bit)
Date	August 10, 2017 18:12:34
Msg-id	CAPpHfdvO0fO_qB2wXWQJ4A7LHv66AdRMrcU63QOpu5=ezk+viw@mail.gmail.com Whole thread
In response to	Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian abit) (Sokolov Yura <funny.falcon@postgrespro.ru>)
Responses	Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfiana bit) Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian a bit)
List	pgsql-hackers

Tree view

On Thu, Aug 10, 2017 at 3:30 PM, Sokolov Yura <funny.falcon@postgrespro.ru> wrote:

On 2017-07-31 18:56, Sokolov Yura wrote:
Good day, every one.

In attempt to improve performance of YCSB on zipfian distribution,
it were found that significant time is spent in XidInMVCCSnapshot in
scanning snapshot->xip array. While overall CPU time is not too
noticable, it has measurable impact on scaleability.

First I tried to sort snapshot->xip in GetSnapshotData, and search in a
sorted array. But since snapshot->xip is not touched if no transaction
contention occurs, sorting xip always is not best option.

Then I sorted xip array on demand in XidInMVCCSnapshot only if
search in snapshot->xip occurs (ie lazy sorting). It performs much
better, but since it is O(NlogN), sort's execution time is noticable
for large number of clients.

Third approach (present in attached patch) is making hash table lazily
on first search in xip array.

Note: hash table is not built if number of "in-progress" xids is less
than 60. Tests shows, there is no big benefit from doing so (at least
on Intel Xeon).

With regards,

Here is new, more correct, patch version, and results for pgbench with
zipfian distribution (50% read 50% write) (same to Alik's tests at
https://www.postgresql.org/message-id/BF3B6F54-68C3-417A-BFAB-FB4D66F2B410@postgrespro.ru )

clients | master | hashsnap2 | hashsnap2_lwlock
--------+--------+-----------+------------------
10 | 203384 | 203813 | 204852
20 | 334344 | 334268 | 363510
40 | 228496 | 231777 | 383820
70 | 146892 | 148173 | 221326
110 | 99741 | 111580 | 157327
160 | 65257 | 81230 | 112028
230 | 38344 | 56790 | 77514
310 | 22355 | 39249 | 55907
400 | 13402 | 26899 | 39742
500 | 8382 | 17855 | 28362
650 | 5313 | 11450 | 17497
800 | 3352 | 7816 | 11030

(Note: I'm not quite sure, why my numbers for master are lower than
Alik's one. Probably, our test methodology differs a bit. Perhaps, it
is because I don't recreate tables between client number change, but
between branch switch).

These results look very cool!

I think CSN is eventually inevitable, but it's a long distance feature. Thus, this optimization could make us a good serve before we would have CSN.

Do you expect there are cases when this patch causes slowdown? What about case when we scan each xip array only once (not sure how to simulate that using pgbench)?

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com

The Russian Postgres Company

pgsql-hackers by date:

From: Robert Haas
Date: 10 August 2017, 18:11:06
Subject: Re: [HACKERS] pl/perl extension fails on Windows

From: Amit Kapila
Date: 10 August 2017, 18:18:23
Subject: Re: [HACKERS] parallelize queries containing initplans

Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian a bit) - Mailing list pgsql-hackers

Previous

Next