Re: B-Tree support function number 3 (strxfrm() optimization) - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: B-Tree support function number 3 (strxfrm() optimization) |
Date | |
Msg-id | CAM3SWZTUdndFhBq95+9H+Oon6fjsRH0B0C4T6g423cwtPE0hNA@mail.gmail.com Whole thread Raw |
In response to | Re: B-Tree support function number 3 (strxfrm() optimization) (Peter Geoghegan <pg@heroku.com>) |
Responses |
Re: B-Tree support function number 3 (strxfrm() optimization)
|
List | pgsql-hackers |
On Thu, Jun 12, 2014 at 2:09 PM, Peter Geoghegan <pg@heroku.com> wrote: > Right. It was a little bit incautious of me to say that we had the > full benefit of 8 bytes of storage with "en_US.UTF-8", since that is > only true of lower case characters (I think that FreeBSD can play > tricks with this. Sometimes, it will give you the benefit of 8 bytes > of entropy for an 8 byte string, with only non-differentiating > trailing bytes, so that the first 8 bytes of "Aaaaaaaa" are distinct > from the first eight bytes of "aaaaaaaa", while any trailing bytes are > non-distinct for both). In any case it's pretty clear that a goal of > the glibc implementation is to concentrate as much entropy as possible > into the first part of the string, and that's the important point. I thought about using an order-preserving compression technique like Hu Tucker [1] in order to get additional benefit from those 8 bytes. Even my apparently sympathetic cities example isn't all that sympathetic, since the number of distinct normalized keys is only about 75% of the total number of cities (while a strcmp()-only reliable tie-breaker isn't expected to be useful for the remaining 25%). Here is the improvement I see when I setup things so that there is a 1:1 correspondence between city rows and distinct normalized keys: postgres=# with cc as (select count(*), array_agg(ctid) ct, substring(strxfrm_test(city)::text from 0 for 19 ) blob from cities group by 3 having count(*) > 1 order by 1), ff as ( select unnest(ct[2:400]) u, blob from cc ) delete from cities using ff where cities.ctid = ff.u; postgres=# vacuum full cities ; VACUUM Afterwards: postgres=# select count(*) from (select distinct substring(strxfrm_test(city)::text from 0 for 19) from cities) i;count --------243782 (1 row) postgres=# select count(*) from cities;count --------243782 (1 row) $ cat sort-city.sql select * from (select * from cities order by city offset 1000000) d; Patch results ========== pgbench -M prepared -f sort-city.sql -T 300 -n transaction type: Custom query scaling factor: 1 query mode: prepared number of clients: 1 number of threads: 1 duration: 300 s number of transactions actually processed: 1734 latency average: 173.010 ms tps = 5.778545 (including connections establishing) tps = 5.778572 (excluding connections establishing) pgbench -M prepared -j 4 -c 4 -f sort-city.sql -T 300 -n transaction type: Custom query scaling factor: 1 query mode: prepared number of clients: 4 number of threads: 4 duration: 300 s number of transactions actually processed: 2916 latency average: 411.523 ms tps = 9.706683 (including connections establishing) tps = 9.706776 (excluding connections establishing) Master results ========== pgbench -M prepared -f sort-city.sql -T 300 -n transaction type: Custom query scaling factor: 1 query mode: prepared number of clients: 1 number of threads: 1 duration: 300 s number of transactions actually processed: 390 latency average: 769.231 ms tps = 1.297545 (including connections establishing) tps = 1.297551 (excluding connections establishing) pgbench -M prepared -j 4 -c 4 -f sort-city.sql -T 300 -n transaction type: Custom query scaling factor: 1 query mode: prepared number of clients: 4 number of threads: 4 duration: 300 s number of transactions actually processed: 535 latency average: 2242.991 ms tps = 1.777005 (including connections establishing) tps = 1.777024 (excluding connections establishing) So that seems like a considerable further improvement that would be nice to see more frequently. I don't know that it's worth it to go to the trouble of compressing given the existing ameliorating measures (HyperLogLog, etc), at least if using compression is motivated by worst case performance. There is some research on making this work for this kind of thing specifically [2]. My concern is that it won't be worth it to do the extra work, particularly given that I already have 8 bytes to work with. Supposing I only had 4 bytes to work with (as researchers writing [2] may have only had in 1994), that would leave me with a relatively small number of distinct normalized keys in many representative cases. For example, I'd have a mere 40,665 distinct normalized keys in the case of my "cities" database, rather than 243,782 (out of a set of 317,102 rows) for 8 bytes of storage. But if I double that to 16 bytes (which might be taken as a proxy for what a good compression scheme could get me), I only get a modest improvement - 273,795 distinct keys. To be fair, that's in no small part because there are only 275,330 distinct city names overall (and so most dups get away with a cheap memcmp() on their tie-breaker), but this is a reasonably organic, representative dataset. Now, it's really hard to judge something like that, and I don't imagine that this analysis is all that scientific. I am inclined to think that we're better off just aborting if the optimization doesn't work out while copying, and forgetting about order preserving compression. Let us not lose sight of the fact that strxfrm() calls are not that expensive relative to the cost of the sort in almost all cases. [1] http://www.cs.ust.hk/mjg_lib/bibs/DPSu/DPSu.Files/jstor_514.pdf [2] http://www.hpl.hp.com/techreports/Compaq-DEC/CRL-94-3.pdf -- Peter Geoghegan
pgsql-hackers by date: