Re: Improve hash join's handling of tuples with null join keys - Mailing list pgsql-hackers
From | Chao Li |
---|---|
Subject | Re: Improve hash join's handling of tuples with null join keys |
Date | |
Msg-id | 5C5EE031-F086-4353-A17A-DA563CD24DDD@gmail.com Whole thread Raw |
In response to | Re: Improve hash join's handling of tuples with null join keys (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
On Aug 16, 2025, at 00:52, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Chao Li <li.evan.chao@gmail.com> writes:With this patch, “isnull” now becomes true because of the change of strict op. Then the outer null join key tuple must be stored in a tuplestore. When an outer table contains a lot of null join key tuples, then the tuplestore could bump to very large, in that case, it would be hard to say this patch really benefits.
What's your point? If we don't divert those tuples into the
tuplestore, then they will end up in the main hash table instead,
and the consequences of bloat there are far worse.
I might not state clearly. For this comments, I meant the outer table. For example:
SELECT a.*, b.* from a RIGHT JOIN b on a.id = b.a_id;
Let’s say table a is used to build hash, table b is the outer table.
And say, table b has 1000 tuples whose a_id are NULL.
Before this patch, when fetching such a tuple (a_id is null) from table b, the tuple will be returned to parent node immediately.
With this tuple, all of such tuples will be put into hj_NullOuterTupleStore, and only be returned after all non-null tuples are processed.
My comment was trying to say that if there are a lot of null join key tuples in outer table, then hj_NullOuterTupleStore might use a lot of memory or swap data to disk, which might lead to performance burden. So, I was thinking we could keep the original logic for outer table, and return null join key tuples immediately.
Based on this patch, if we are doing a left join, and outer table is empty, then all tuples from the inner table should be returned. In that case, we can skip building a hash table, instead, we can put all inner table tuples into hashtable.innerNullTupleStore. Building a tuplestore should be cheaper than building a hash table, so this way makes a little bit more performance improvement.
I think that would make the logic completely unintelligible. Also,
a totally-empty input relation is not a common situation. We try to
optimize such cases when it's simple to do so, but we shouldn't let
that drive the fundamental design.
I absolutely agree we should not touch the fundamental design for the tiny optimization, that’s why I mentioned “based on this patch”.
With this patch, you have introduced a change in MultiExecPrivateHash():
else if (node->keep_null_tuples)
{
/* null join key, but we must save tuple to be emitted later */
if (node->null_tuple_store == NULL)
node->null_tuple_store = ExecHashBuildNullTupleStore(hashtable);
tuplestore_puttupleslot(node->null_tuple_store, slot);
}
We can simply added a new flag to HashTable, say named skip_building_hash. Upon right join (join to the hash side), and outer table is empty, set the flag to true, then in the MultiExecPrivateHash(), if skip_building_hash is true, directly put all tuples into node->null_tuple_store without building a hash table.
Then in ExecHashJoinImpl(), after "(void) MultiExecProcNode()" is called, if hashtable->skip_building_hash is true, directly set node->hj_JoinState = HJ_FILL_INNER_NULL_TUPLES.
So, the tiny optimization is totally based on this patch, it depends on the HashTable.null_tuple_store (if you take this comment, then maybe rename this variable) and the new state HJ_FILL_INNER_NULL_TUPLES.
Best regards,
==
Chao Li (Evan)
--------------------
HighGo Software Co., Ltd.
https://www.highgo.com/
--------------------
HighGo Software Co., Ltd.
https://www.highgo.com/
pgsql-hackers by date: