Re: Hybrid Hash/Nested Loop joins and caching results from subplans - Mailing list pgsql-hackers
From | Andy Fan |
---|---|
Subject | Re: Hybrid Hash/Nested Loop joins and caching results from subplans |
Date | |
Msg-id | CAKU4AWpoQ_c9Rj2sTVPMGxAb+CAAe4z5ejcrL-89QirCxk=k+A@mail.gmail.com Whole thread Raw |
In response to | Re: Hybrid Hash/Nested Loop joins and caching results from subplans (David Rowley <dgrowleyml@gmail.com>) |
Responses |
Re: Hybrid Hash/Nested Loop joins and caching results from subplans
|
List | pgsql-hackers |
On Mon, Nov 2, 2020 at 3:44 PM David Rowley <dgrowleyml@gmail.com> wrote:
On Tue, 20 Oct 2020 at 22:30, David Rowley <dgrowleyml@gmail.com> wrote:
>
> So far benchmarking shows there's still a regression from the v8
> version of the patch. This is using count(*). An earlier test [1] did
> show speedups when we needed to deform tuples returned by the nested
> loop node. I've not yet repeated that test again. I was disappointed
> to see v9 slower than v8 after having spent about 3 days rewriting the
> patch
I did some further tests this time with some tuple deforming. Again,
it does seem that v9 is slower than v8.
I run your test case on v8 and v9, I can produce a stable difference between them.
v8:
statement latencies in milliseconds:
1603.611 select count(*) from hundredk hk inner join lookup l on hk.thousand = l.a;
statement latencies in milliseconds:
1603.611 select count(*) from hundredk hk inner join lookup l on hk.thousand = l.a;
v9:
statement latencies in milliseconds:
1772.287 select count(*) from hundredk hk inner join lookup l on hk.thousand = l.a;
1772.287 select count(*) from hundredk hk inner join lookup l on hk.thousand = l.a;
then I did a perf on the 2 version, Is it possible that you called tts_minimal_clear twice in
the v9 version? Both ExecClearTuple and ExecStoreMinimalTuple called tts_minimal_clear
on the same slot.
With the following changes:
diff --git a/src/backend/executor/execMRUTupleCache.c b/src/backend/executor/execMRUTupleCache.c
index 3553dc26cb..b82d8e98b8 100644
--- a/src/backend/executor/execMRUTupleCache.c
+++ b/src/backend/executor/execMRUTupleCache.c
@@ -203,10 +203,9 @@ prepare_probe_slot(MRUTupleCache *mrucache, MRUCacheKey *key)
TupleTableSlot *tslot = mrucache->tableslot;
int numKeys = mrucache->nkeys;
- ExecClearTuple(pslot);
-
if (key == NULL)
{
+ ExecClearTuple(pslot);
/* Set the probeslot's values based on the current parameter values */
for (int i = 0; i < numKeys; i++)
pslot->tts_values[i] = ExecEvalExpr(mrucache->param_exprs[i],
@@ -641,7 +640,7 @@ ExecMRUTupleCacheFetch(MRUTupleCache *mrucache)
{
mrucache->state = MRUCACHE_FETCH_NEXT_TUPLE;
- ExecClearTuple(mrucache->cachefoundslot);
+ // ExecClearTuple(mrucache->cachefoundslot);
slot = mrucache->cachefoundslot;
ExecStoreMinimalTuple(mrucache->last_tuple->mintuple, slot, false);
return slot;
@@ -740,7 +739,7 @@ ExecMRUTupleCacheFetch(MRUTupleCache *mrucache)
return NULL;
}
- ExecClearTuple(mrucache->cachefoundslot);
+ // ExecClearTuple(mrucache->cachefoundslot);
slot = mrucache->cachefoundslot;
ExecStoreMinimalTuple(mrucache->last_tuple->mintuple, slot, false);
return slot;
index 3553dc26cb..b82d8e98b8 100644
--- a/src/backend/executor/execMRUTupleCache.c
+++ b/src/backend/executor/execMRUTupleCache.c
@@ -203,10 +203,9 @@ prepare_probe_slot(MRUTupleCache *mrucache, MRUCacheKey *key)
TupleTableSlot *tslot = mrucache->tableslot;
int numKeys = mrucache->nkeys;
- ExecClearTuple(pslot);
-
if (key == NULL)
{
+ ExecClearTuple(pslot);
/* Set the probeslot's values based on the current parameter values */
for (int i = 0; i < numKeys; i++)
pslot->tts_values[i] = ExecEvalExpr(mrucache->param_exprs[i],
@@ -641,7 +640,7 @@ ExecMRUTupleCacheFetch(MRUTupleCache *mrucache)
{
mrucache->state = MRUCACHE_FETCH_NEXT_TUPLE;
- ExecClearTuple(mrucache->cachefoundslot);
+ // ExecClearTuple(mrucache->cachefoundslot);
slot = mrucache->cachefoundslot;
ExecStoreMinimalTuple(mrucache->last_tuple->mintuple, slot, false);
return slot;
@@ -740,7 +739,7 @@ ExecMRUTupleCacheFetch(MRUTupleCache *mrucache)
return NULL;
}
- ExecClearTuple(mrucache->cachefoundslot);
+ // ExecClearTuple(mrucache->cachefoundslot);
slot = mrucache->cachefoundslot;
ExecStoreMinimalTuple(mrucache->last_tuple->mintuple, slot, false);
return slot;
v9 has the following result:
1608.048 select count(*) from hundredk hk inner join lookup l on hk.thousand = l.a;
Graphs attached
Looking at profiles, I don't really see any obvious reason as to why
this is. I'm very much inclined to just pursue the v8 patch (separate
Result Cache node) and just drop the v9 idea altogether.
David
Best Regards
Andy Fan
pgsql-hackers by date: