Re: pgsql: Add parallel-aware hash joins. - Mailing list pgsql-committers
From | Andres Freund |
---|---|
Subject | Re: pgsql: Add parallel-aware hash joins. |
Date | |
Msg-id | 20171230161634.3lpvgerqsysricaa@alap3.anarazel.de Whole thread Raw |
In response to | Re: pgsql: Add parallel-aware hash joins. (Thomas Munro <thomas.munro@enterprisedb.com>) |
Responses |
Re: pgsql: Add parallel-aware hash joins.
|
List | pgsql-committers |
Hi, On 2017-12-31 02:51:26 +1300, Thomas Munro wrote: > You mentioned that prairiedog sees the problem about one time in > thirty. Would you mind checking if it goes away with this patch > applied? > > -- > Thomas Munro > http://www.enterprisedb.com > From cbed027275039cc5debf8db89342a133a831c9ca Mon Sep 17 00:00:00 2001 > From: Thomas Munro <thomas.munro@enterprisedb.com> > Date: Sun, 31 Dec 2017 02:03:07 +1300 > Subject: [PATCH] Fix EXPLAIN ANALYZE output for Parallel Hash. > > In a race case, EXPLAIN ANALYZE could fail to display correct nbatch and size > information. Refactor so that participants report only on batches they worked > on rather than trying to report on all of them, and teach explain.c to > consider the HashInstrumentation object from all participants instead of > picking the first one it can find. This should fix an occasional build farm > failure in the "join" regression test. This seems buggy independent of whether it fixes the issue on prairedog, right? So I'm inclined to go ahead and just fix it... > + /* > + * Merge results from workers. In the parallel-oblivious case, the > + * results from all participants should be identical, except where > + * participants didn't run the join at all so have no data. In the > + * parallel-aware case, we need to aggregate the results. Each worker may > + * have seen a different subset of batches and we want to report the peak > + * memory usage across all batches. > + */ It's not necessarily the peak though, right? The largest batches might not be read in at the same time. I'm fine with approximating it as such, just want to make sure I understand. > + if (hashstate->shared_info) > { > SharedHashInfo *shared_info = hashstate->shared_info; > int i; > > - /* Find the first worker that built a hash table. */ > for (i = 0; i < shared_info->num_workers; ++i) > { > - if (shared_info->hinstrument[i].nbatch > 0) > + HashInstrumentation *worker_hi = &shared_info->hinstrument[i]; > + > + if (worker_hi->nbatch > 0) > { > - hinstrument = &shared_info->hinstrument[i]; > - break; > + /* > + * Every participant should agree on the buckets, so to be > + * sure we have a value we'll just overwrite each time. > + */ > + hinstrument.nbuckets = worker_hi->nbuckets; > + hinstrument.nbuckets_original = worker_hi->nbuckets_original; > + /* > + * Normally every participant should agree on the number of > + * batches too, but it's possible for a backend that started > + * late and missed the whole join not to have the final nbatch > + * number. So we'll take the largest number. > + */ > + hinstrument.nbatch = Max(hinstrument.nbatch, worker_hi->nbatch); > + hinstrument.nbatch_original = worker_hi->nbatch_original; > + /* > + * In a parallel-aware hash join, for now we report the > + * maximum peak memory reported by any worker. > + */ > + hinstrument.space_peak = > + Max(hinstrument.space_peak, worker_hi->space_peak); I bet pgindent will not like this layout. Ho hum. Is this really, as you say above, an "aggregate the results"? Greetings, Andres Freund
pgsql-committers by date: