Thread: Which side of a Merge Join gets executed first? Do both sides always get executed?
Which side of a Merge Join gets executed first? Do both sides always get executed?
From
Jerry Brenner
Date:
The attached query plan is from 11.

We are getting Merge Joins on both sides of the UNION. In both cases, the first node under the Merge Join returns 0 rows but the other side of the Merge Join (the one being sorted) is executed and that's where all of the time is spent.
On the surface, I don't see any way from the attached explain plan to determine which side of the Merge Join is executed first. Some questions:
- Which side gets executed first?
- How would one tell that from the json?
- Have there been any relevant changes to later releases to make that more apparent?
- Whichever side gets executed first, is the execution of the side that would be second get short circuited if 0 rows are returned by the first side?
Here's a screenshot from pgMustard.
- Nodes 6 and 14 (the first node under each of the Merge Joins) each return 0 rows
- Nodes 9 and 15 are the expensive sides of the Merge Joins and return lots of rows

NOTE:
- The query plan in 13 is slightly different, but still includes the Merge Joins.
- Replacing ANY(ARRAY(<subquery)) with IN(<subquery>) fixes the performance problem, but we'd still like to understand the execution characteristics of Merge Join
Thanks,
Jerry
Attachment
Re: Which side of a Merge Join gets executed first? Do both sides always get executed?
From
Frédéric Yhuel
Date:
Le 20/12/2023 à 15:40, Jerry Brenner a écrit : > The attached query plan is from 11. > We are getting Merge Joins on both sides of the UNION. In both cases, > the first node under the Merge Join returns 0 rows but the other side of > the Merge Join (the one being sorted) is executed and that's where all > of the time is spent. > > On the surface, I don't see any way from the attached explain plan to > determine which side of the Merge Join is executed first. Some questions: > > * Which side gets executed first? > * How would one tell that from the json? > * Have there been any relevant changes to later releases to make that > more apparent? > * Whichever side gets executed first, is the execution of the side > that would be second get short circuited if 0 rows are returned by > the first side? > > Here's a screenshot from pgMustard. > > * Nodes 6 and 14 (the first node under each of the Merge Joins) each > return 0 rows > * Nodes 9 and 15 are the expensive sides of the Merge Joins and return > lots of rows I think those nodes (9 and 15) are expensive because they have to filter out 8 millions rows in order to produce their first output row. After that, they get short circuited. Best regards, Frédéric
Re: Which side of a Merge Join gets executed first? Do both sides always get executed?
From
Frédéric Yhuel
Date:
Le 20/12/2023 à 15:40, Jerry Brenner a écrit : > Whichever side gets executed first, is the execution of the side that > would be second get short circuited if 0 rows are returned by the first > side? Indeed, if 0 rows are returned from the outer relation, the scan of the inner relation is never executed. Best regards, Frédéric
Re: Which side of a Merge Join gets executed first? Do both sides always get executed?
From
Jerry Brenner
Date:
Thanks. Does this make sense?
- There are 3 nodes under the Merge Join
- The first node is an InitPlan, due to the ANY(ARRAY()) - that gets executed and finds 0 matching rows
- The second node is the outer node in the Merge Join and that is the expensive node in our query plan
- The third node is the inner node in the Merge Join and that node references the SubPlan generated by the first node. The IndexCond has "id = ANY($2) AND ..." and the comparison with the result of the SubPlan does not find a match, so that's where the short-circuiting happens.
Here are the relevant lines from the node (12) accessing the result of the SubPlan:
"Plans": [
{
"Node Type": "Index Only Scan",
"Parent Relationship": "Outer",
"Parallel Aware": false,
"Scan Direction": "Forward",
"Index Name": "policyperi_u_id_1mw8mh83lyyd9",
"Relation Name": "pc_policyperiod",
"Alias": "qroots0",
"Startup Cost": 0.69,
"Total Cost": 18.15,
"Plan Rows": 10,
"Plan Width": 8,
"Actual Startup Time": 0.045,
"Actual Total Time": 0.045,
"Actual Rows": 0,
"Actual Loops": 1,
"Index Cond": "((id = ANY ($2)) AND (retired = 0) AND (temporarybranch = false))",
Here's the screenshot again:

Thanks,
Jerry
On Wed, Dec 20, 2023 at 10:32 AM Frédéric Yhuel <frederic.yhuel@dalibo.com> wrote:
Le 20/12/2023 à 15:40, Jerry Brenner a écrit :
> Whichever side gets executed first, is the execution of the side that
> would be second get short circuited if 0 rows are returned by the first
> side?
Indeed, if 0 rows are returned from the outer relation, the scan of the
inner relation is never executed.
Best regards,
Frédéric
Attachment
Re: Which side of a Merge Join gets executed first? Do both sides always get executed?
From
Frédéric Yhuel
Date:
Le 20/12/2023 à 20:04, Jerry Brenner a écrit : > Thanks. Does this make sense? > > * There are 3 nodes under the Merge Join > * The first node is an InitPlan, due to the ANY(ARRAY()) - that gets > executed and finds 0 matching rows > * The second node is the outer node in the Merge Join and that is the > expensive node in our query plan > * The third node is the inner node in the Merge Join and that node > references the SubPlan generated by the first node. The IndexCond > has*"id = ANY($2) AND ..."* and the comparison with the result of > the SubPlan does not find a match, so that's where the > short-circuiting happens. I think it does. I'm not very experienced with the customs of these mailing lists, but I think the following would help to get more answers : * TEXT format of EXPLAIN is much more readable (compared to JSON) * A well formatted query would help * Screenshots aren't so great Rather than a screenshot, maybe you could use one of explain.depesz.com, explain.dalibo.com, or explain-postgresql.com ? Best regards, Frédéric