Home > mailing lists

Re: Terrible plan for join to nested union - Mailing list pgsql-performance

From	Pavel Stehule
Subject	Re: Terrible plan for join to nested union
Date	July 8, 2012 03:04:29
Msg-id	CAFj8pRD5Hn92L_uJfr0DKcHCOW72VJ7Y0T7VhT6_kP3BUSgNjg@mail.gmail.com Whole thread Raw
In response to	Re: Terrible plan for join to nested union (Nate Allan <nallan@ancestry.com>)
Responses	Re: Terrible plan for join to nested union
List	pgsql-performance

Tree view

2012/7/8 Nate Allan <nallan@ancestry.com>:
> Thanks for your reply Tom.
>
>>> I have a query which joins to a nested union and I'm getting a plan which never returns.  Here is the query
simplifiedas much as possible: 
>>> select 'anything' as result
>>>                from "Attribute" as A1
>>>                               inner join
>>>                               (
>>>                                              select R."TargetID" as "SourceID"
>>>                                                             from "Relationship" as R
>>>                                              union
>>>                                              select A2."PersonID" as "SourceID"
>>>                                                             from "Attribute" as A2
>>>                               ) as X on (A1."PersonID" = X."SourceID")
>>>                where (A1."ID" = 124791200)
>>
>>  AFAICS, the UNION result must include every possible value of Attribute.PersonID, which means the inner join cannot
>>eliminate any rows of A1 (except those with null PersonID), which seems a tad silly.
>
> It seems to me that the join condition (and hence the restriction) should be pushed down into both sides of the union
tobring the cardinality limit from millions to 1.  I'm imagining a rewrite like this: 
>         R(a) J (b U c)  ->  (b J R(a)) U (c J R(a))
> ...where R = Restrict, J = Join, U = Union
>
> This is the kind of rewrite I would make as a sentient being and it's one that at least one other DBMS I know of
makes.
>
> As an aside, even though not as good as pushing down the restriction, the plan that the "union all" produces is
decentperformance-wise: 
> http://explain.depesz.com/s/OZq
> It seems to me that a similar alternative could be applied for a distinct union by using two Index Scans followed by
aMerge Join. 
>
>>What exactly are you trying to accomplish here?
>
> I state in my post that there are several ways to rewrite the query to work-around the issue; I'm not really asking
fora work-around but a) wondering why the plan is so bad; and b) asking if it could be fixed if possible.
Unfortunatelyrewriting the query isn't a trivial matter in our case because the X (union) part of the query is
representedlogically as a view, which is expected to be restricted and/or joined so as not to actually materialize the
actualunion.  Unfortunately the PostgreSQL planner seems to want to actually materialize that view.  Working around
thiswould basically entail not using the view, which is used all over the place, and instead duplicating the view's
logicexcept pushing the restrictions and/or joins down into both sides of the union in each case.  I could do that, but
doingso would be: a) against the spirit of the Relational Model; b) against the spirit of "fix the planner rather than
addoptimizer hints"; c) a royal pain because it causes a rewrite of application logic; d) a point for at least one
otherDBMS's optimizer.  :-) 

you are using EAV schema - it is against to relation model enough :)

this schema has the most terrible performance for large datasets -
looks on hstore instead

Regards

Pavel

>
>>Anyway, I wonder whether you'd get better results with an EXISTS over a correlated UNION ALL subquery, ie, something
like
>> ...
>
> Thanks for the work-arounds, but again, that's not quite what I'm after.
>
> Best,
>
> -Nate
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

pgsql-performance by date:

From: Nate Allan
Date: 08 July 2012, 02:50:39
Subject: Re: Terrible plan for join to nested union

From: Nate Allan
Date: 08 July 2012, 06:56:39
Subject: Re: Terrible plan for join to nested union

Re: Terrible plan for join to nested union - Mailing list pgsql-performance

Previous

Next