Re: [HACKERS] CTE inlining - Mailing list pgsql-hackers

From Ilya Shkuratov
Subject Re: [HACKERS] CTE inlining
Date
Msg-id 8550391494354384@web15m.yandex.ru
Whole thread Raw
In response to Re: [HACKERS] CTE inlining  (Vik Fearing <vik.fearing@2ndquadrant.com>)
Responses Re: [HACKERS] CTE inlining
List pgsql-hackers
Ok, it seems that most people in discussion are agree that removing optimization
fence is a right thing to do. But so far the main topic was whether it worth to 
make "inlining" by default, and how we should enable it.

Nonetheless I still hoping to discuss the algorithm and its implementation. 

I suppose, in case of a single reference we can validate CTE subquery and inline it
just before SS_process_ctes() in subquery_planner() and then process remaining
CTEs as before. 

The case of multiple reference is more interesting.
Ideally, we would decide whether to inline just before pull_up_sublinks(), so all 
the optimizations can be applied to inlined subquery. But It is impossible as we 
have no information to build subquery paths and estimate they costs at this point. 
All necessary initialization is performed in query_planner(), that invoked far
later in grouping_planner(). (As far as I understand.)

The most straighforward way is to compare CTE scan cost with subquery execution
and result scan cost in set_rel_size(), just after set_cte_pathlist(), and alter 
RelOptInfo, if we choose to inline.
(e.g (CTE scan) < (cheapest_path(subquery) + subquery scan))
This way we still can push down predicates as it is performed in 
set_subquery_pathlist(), but we missed pull_up_subquery().
Besides, it seems like a dirty quick solution.

Maybe it possible to add subquery scan to RTE_CTE RelOptInfo, but I'm not sure.

So what is a right way to conduct comparison between CTE scan and subquery 
execution with subsequent scan?

I am new to PostgreSQL development, so I need a guidance from someone who 
familiar with optimizer infrastructure to ensure that I moving in a right 
direction and not making something weird.


P.S. There is a paper [1] describing implementation of CTE optimization in Orca 
optimizer. It may be useful, though architecture is completely different.

[1] Optimization of Common Table Expressions in MPP Database Systems 
(http://www.vldb.org/pvldb/vol8/p1704-elhelw.pdf)


Ilya Shkuratov



pgsql-hackers by date:

Previous
From: Erez Segal
Date:
Subject: [HACKERS] COMPRESS VALUES feature request
Next
From: Marko Tiikkaja
Date:
Subject: Re: [HACKERS] COMPRESS VALUES feature request