Re: pgsql: Add some regression tests that exercise hash join code. - Mailing list pgsql-committers
From | Andres Freund |
---|---|
Subject | Re: pgsql: Add some regression tests that exercise hash join code. |
Date | |
Msg-id | 20171204202142.spwkwc545vvxmed7@alap3.anarazel.de Whole thread Raw |
In response to | Re: pgsql: Add some regression tests that exercise hash join code. (Thomas Munro <thomas.munro@enterprisedb.com>) |
Responses |
Re: pgsql: Add some regression tests that exercise hash join code.
|
List | pgsql-committers |
On 2017-12-04 12:28:56 +1300, Thomas Munro wrote: > From 43eeea0bc35204440d262224b56efca958b33428 Mon Sep 17 00:00:00 2001 > From: Thomas Munro <thomas.munro@enterprisedb.com> > Date: Mon, 4 Dec 2017 11:52:11 +1300 > Subject: [PATCH] Fix EXPLAIN ANALYZE of hash join when the leader doesn't > execute. > > If a hash join appears in a parallel query, there may be no hash table > available for explain.c to inspect even though a hash table may have been > built in other processes. This could happen either because > parallel_leader_participation was set to off or because the leader happened to > hit the end of the outer relation immediately (even though the complete > relation is not empty) and decided not to build the hash table. > > Commit bf11e7ee introduced a way for workers to exchange instrumentation via > the DSM segment for Sort nodes even though they are not parallel-aware. This > commit does the same for Hash nodes, so that explain.c has a way to find > instrumentation data from an arbitrary participant that actually built the > hash table. Makes sense. Seems ok to only fix this inv11, even if present earlier. > diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c > index 447f69d044e..1ffe635d66c 100644 > --- a/src/backend/commands/explain.c > +++ b/src/backend/commands/explain.c > @@ -2379,34 +2379,62 @@ show_sort_info(SortState *sortstate, ExplainState *es) > static void > show_hash_info(HashState *hashstate, ExplainState *es) > { > - HashJoinTable hashtable; > + HashInstrumentation *hinstrument = NULL; > > - hashtable = hashstate->hashtable; > + /* > + * In a parallel query, the leader process may or may not have run the > + * hash join, and even if it did it may not have built a hash table due to > + * timing (if it started late it might have seen no tuples in the outer > + * relation and skipped building the hash table). Therefore we have to be > + * prepared to get instrumentation data from a worker if there is no hash > + * table. > + */ > + if (hashstate->hashtable) > + { > + hinstrument = (HashInstrumentation *) > + palloc(sizeof(HashInstrumentation)); > + ExecHashGetInstrumentation(hinstrument, hashstate->hashtable); > + } > + else if (hashstate->shared_info) > + { > + SharedHashInfo *shared_info = hashstate->shared_info; > + int i; > + > + /* Find the first worker that built a hash table. */ > + for (i = 0; i < shared_info->num_workers; ++i) > + { > + if (shared_info->hinstrument[i].nbatch > 0) > + { > + hinstrument = &shared_info->hinstrument[i]; > + break; > + } > + } > + } I can't think of a way that, before the parallel HJ patch, results in a difference between the size of the hashtable between workers. Short of some participant not having a hashtable at all, that is. Wwe also don't currently display lookup information etc. So just more or less randomly choosing one worker's seems ok. > + case T_HashState: > + /* even when not parallel-aware */ "..., for stats" or such maybe? > + ExecHashEstimate((HashState *) planstate, e->pcxt); > + break; > case T_SortState: > /* even when not parallel-aware */ I'd just update that comment as well. > #endif /* NODEHASH_H */ > diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h > index e05bc04f525..6c322e57c00 100644 > --- a/src/include/nodes/execnodes.h > +++ b/src/include/nodes/execnodes.h > @@ -1980,6 +1980,29 @@ typedef struct GatherMergeState > struct binaryheap *gm_heap; /* binary heap of slot indices */ > } GatherMergeState; > > +/* ---------------- > + * Values displayed by EXPLAIN ANALYZE > + * ---------------- > + */ > +typedef struct HashInstrumentation > +{ > + int nbuckets; > + int nbuckets_original; > + int nbatch; > + int nbatch_original; > + size_t space_peak; > +} HashInstrumentation; Maybe I'm being pedantic here, but I think it'd not hurt to explain what these mean. It's obviously pretty clear what nbuckets/batch mean, but the difference between original and not is far less clea.r Looks reasonable these nitpicks aside. Greetings, Andres Freund
pgsql-committers by date: