Re: PATCH: postpone building buckets to the end of Hash (in HashJoin) - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: PATCH: postpone building buckets to the end of Hash (in HashJoin) |
Date | |
Msg-id | 569EAF53.30202@2ndquadrant.com Whole thread Raw |
In response to | Re: PATCH: postpone building buckets to the end of Hash (in HashJoin) (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: PATCH: postpone building buckets to the end of Hash (in HashJoin)
|
List | pgsql-hackers |
On 01/19/2016 08:34 PM, Robert Haas wrote: > On Mon, Jan 18, 2016 at 10:57 PM, Tomas Vondra > <tomas.vondra@2ndquadrant.com> wrote: >>>> If this doesn't regress performance in the case where the number of >>>> buckets is estimated accurately to begin with, then I think this is >>>> a great idea. Can you supply some performance tests results for that >>>> case, and maybe some of the other cases also? >>> >>> I don't see how it could regress performance, and the benchmarks I've >>> done confirm that. I'll do more thorough benchmarking and post the >>> results here, but not now as this patch is in 2016-01 CF and I want to >>> put all my time into reviewing patches from the open commitfest. >> >> I've finally got to do more thorough benchmarks, and surprisingly there >> seems to be a minor regression. > ... >> >> So even if we entirely skipped the bucket build, we would not recover the >> difference. Not sure what's going on here. >> >> I've also done some profiling using perf, but I haven't really found >> anything suspicious there. >> >> Any ideas what might be the cause of this? > > My guess is that memory locality is not as good with this patch. Consider this: > > copyTuple->next = hashtable->buckets[bucketno]; > > We've only just populated copytuple via memcpy, so that cache line is > presumably in whatever place cache lines go when they are really hot. > We basically make one pass over the allocated space for the hash > table, filling in the hash tuples and linking things into bucket > chains. OTOH, with the patch, we don't do that on the first pass, > just writing the tuples. Then when we come back through we still have > to do that part: > > hashTuple->next = hashtable->buckets[bucketno]; > > By the time we're doing this, especially at larger work_mem settings, > we've probably pushed those cache lines out of the faster levels of > the CPU cache, and we have to pull them back in again, and that takes > time. If the tuples are small, we'll dirty every cache line twice; > if they're larger, we might not dirty every cache line on the second > pass, but just some fraction of them. > > I could be totally off base, but this is why I was concerned about the patch. I can totally see why this would slow-down the BuildBuckets function, but I don't quite see why it should make the other code significantly slower. Yet BuildBuckets takes just ~25ms while the total duration increases by ~200ms (for the 1x10 dataset). I do understand calling BuildBuckets may affect the code executed after that as it keeps other data in the cache, but i would not expect ~175ms. Yet another thing is that BuildBuckets accesses the tuples in mostly sequential way, so I'd expect prefetch to take care of that. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: