Home > mailing lists

Re: Macro customizable hashtable / bitmapscan & aggregation perf - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: Macro customizable hashtable / bitmapscan & aggregation perf
Date	October 1, 2016 23:37:46
Msg-id	0b71d15b-89ef-bd4f-d653-0a118b550601@2ndquadrant.com Whole thread Raw
In response to	Re: Macro customizable hashtable / bitmapscan & aggregation perf (Andres Freund <andres@anarazel.de>)
Responses	Re: Macro customizable hashtable / bitmapscan & aggregation perf
List	pgsql-hackers

Tree view

On 10/01/2016 09:59 PM, Andres Freund wrote:
> Hi,
>
> On 2016-10-01 20:19:21 +0200, Tomas Vondra wrote:
>> On 10/01/2016 02:44 AM, Andres Freund wrote:
>>> Hi,
>>>
>>> On 2016-07-26 17:43:33 -0700, Andres Freund wrote:
>>>> In the attached patch I've attached simplehash.h, which can be
>>>> customized by a bunch of macros, before being inlined.  There's also a
>>>> patch using this for tidbitmap.c and nodeAgg/nodeSubplan/... via
>>>> execGrouping.c.
>>>
>>> Attached is a significantly improved version of this series.  The main
>>> changes are:
>>>
>>> - The hash table now uses robin-hood style hash collision handling. See the
>>>   commit message of 0002 (or simplehash.h) for an introduction. That
>>>   significantly reduces the impact of "clustering" due to linear
>>>   addressing.
>>
>> Interesting. Have you looked at cuckoo hashing?
>
> Yes. I don't think it's a good fit for modern CPUs. Because it
> doesn't probe linearly you constantly have random uncached memory
> accesses.
>
> I've played with a few schemes, and they all seemed to be slower
> than linear probing deviations, unless you intentionally used a
> wrong hash-distribution, or intentionally "unbalanced" the hash-space
> by iterating over either end of the keyspace and moved the entries
> around.

OK, understood.

>
>>> - Significant comment and correctness fixes, both in simplehash.h
>>> - itself, and 0003/0004.
>>> - a lot of other random performance improvements for the hashing code.
>>>
>>>
>>> Unfortunately I'm running out battery right now, so I don't want to
>>> re-run the benchmarks posted upthread (loading takes a while). But the
>>> last time I ran them all the results after the patches were better than
>>> before.
>>>
>>
>> Well, I have rather bad experience with running benchmarks on laptops anyway
>> - a lot of noise due to power management, etc.
>
> Well, with factor ~2x improvements thats not really a big detractor.
> Using a new CPU makes some forms of analysis easier (better PMUs).
>

True.
>
> For longrunning tests I agree.
>
>
>> What about running a bigger benchmark - say, TPC-DS - and evaluating
>> the impact?
>
> Worthwhile, although obviously the impact will be a lot smaller,
> since they're not entirely bottlenecked on hash-aggs and bitmap
> scans.
>

Sure, the improvement won't be as significant as on the simple queries, 
but it's interesting IMHO.

>
>> I think a crucial part of the benchmarking will be identifying and
>> measuring corner cases, e.g. a lot of rows with the same key, etc.
>> Although that probably is not a major issue for the two places
>> switched to the new implementation (e.g. execGrouping merges the
>> duplicates to a single group, by definition).
>
> Hm. I don't really see a case where that's going to cause issues - all
> the execGrouping.c users only store one key, and either ignore later
> ones, or add the data to the initial tuple in the group.  I don't really
> see any case where repeated keys should cause an issue for a hash table?
>

Yep, that's pretty much what I suggested. I was wondering more about the 
other places where this hash table might be used - I've been thinking 
about hashjoins, but now that I think of it - that's a fairly 
specialized and tuned code. In any case, let's not complicate the 
discussion for now.

>
>>> This patch series currently consists out of four patches:
>>> - 0001 - add unlikely/likely() macros. The use here is not entirely
>>>   mandatory, but they do provide a quite consistent improvement. There
>>>   are other threads where they proved to be beneficial, so I see little
>>>   reason not to introduce them.
>>> - 0002 - the customizable hashtable itself. It now contains documentation.
>>> - 0003 - convert tidbitmap.c to use the new hashmap. This includes a fix
>>>   for the issue mentioned in [1], to improve peformance in heavily lossy
>>>   scenarios (otherwise we could regress in some extreme cases)
>>> - 0004 - convert execGrouping.c, e.g. used by hash aggregates
>>>
>>>
>>> While not quite perfect yet, I do think this is at a state where input
>>> is needed.  I don't want to go a lot deeper into this rabbit hole,
>>> before we have some agreement that this is a sensible course of action.
>>>
>>
>> So, is it the right time to do some benchmarking?
>
> That's one thing that's required, yes. The other is whether we can live
> with the uglyness of implementing templates via macros. I do think we
> can, but...
>

Hmmm ... not sure. If one of the points is to get rid of function calls 
determined at runtime (which make it impossible for the compiler to 
optimize the code), then I can think of three options:

(a) copy-paste the code for each place
(b) use some templating
(c) use JIT

I think (b) is way better than (a), and I don't think we're using JIT 
anywhere at this point. So while the macro-based templates look a bit 
awkward, I'm not aware of a better C thing.

A few comments, after quickly skimming through the first two patches:

1) SH_CREATE does this:
/* round up size to the next power of 2, eases lookups */if (size < 2)    size = 2;else    size = sh_pow2_int(size);

I very much doubt a hash table with 2 buckets is very useful. What about 
using some reasonable lower boundary - IIRC hashjoins use 1024 buckets, 
which might be a bit too much, but perhaps 256 would work?

2) tb->resize_above

I'd say 'resize_threshold' is a better name. Also, 0.8 should be defined 
as a constant somewhere (not sure if it makes sense to make it part of 
SH_TYPE, so that hash tables may use different load factors).

3) 'size' is a rather poor name for parameter (e.g. in SH_CREATE or 
SH_RESIZE), as it's unclear whether it's 'element size in bytes', 'total 
size in bytes' or 'number of entries'.

4) SH_CONTAINS sounds more like a function checking whether a hash table 
contains a key, not as a 'type of hash table entry'. What about 
SH_ENTRY_TYPE instead? Also, SH_KEYTYPE should be SH_KEY_TYPE I guess.

5) SH_RESIZE

Do we expect to shrink hash tables? If not, SH_RESIZE should probably 
check that newsize > oldsize. If yes, it should check that there's 
enough space for all entries (with the load factor).

It's not entirely clear why is it guaranteed that there's always an 
element with optimal position (when load factor < 1)? Adding an 
explanation or a link to a paper would be helpful, I guess.

6) SH_STAT

This bit is a bit suspicious:
    uint32 max_collisions = 0;
    ...
    /* single contained element is not a collision */    curcoll--;    total_collisions += curcoll;    if (curcoll >
max_collisions)       max_collisions = curcoll - 1;

Shouldn't the last line be just "max_collisions = curcoll"?

7) Should hash agg size estimation for the planner consider the fillfactor?

I think we should account for fillfactor - we should account for the 
allocated memory, even if there's free space in the hash table. After 
all, that's what hashjoins do.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-hackers by date:

From: John Gorman
Date: 01 October 2016, 22:23:24
Subject: Re: PATCH: two slab-like memory allocators

From: Jim Nasby
Date: 01 October 2016, 23:53:37
Subject: Re: PATCH: two slab-like memory allocators

Re: Macro customizable hashtable / bitmapscan & aggregation perf - Mailing list pgsql-hackers

Previous

Next