PassDownLimitBound for ForeignScan/CustomScan [take-2] - Mailing list pgsql-hackers
From | Kouhei Kaigai |
---|---|
Subject | PassDownLimitBound for ForeignScan/CustomScan [take-2] |
Date | |
Msg-id | 9A28C8860F777E439AA12E8AEA7694F8012505BC@BPXM15GP.gisp.nec.co.jp Whole thread Raw |
Responses |
Re: PassDownLimitBound for ForeignScan/CustomScan [take-2]
Re: PassDownLimitBound for ForeignScan/CustomScan [take-2] |
List | pgsql-hackers |
Hello, The attached patch is revised version of the pass-down-bounds feature. Its functionality is not changed from the previous version, however, implementation was revised according to the discussion at the last CF. This patch add a new fields (ps_numTuples) to the PlanState. This is a hint for optimization when parent node needs first N-tuples only. It shall be set prior to ExecProcNode() after ExecInitNode() or ExecReScan() by the parent node, then child nodes can adjust its execution behavior (e.g, Sort will take top-N heapsort if ps_numTuples is set) and pass down the hint to its child nodes furthermore. As an example, I enhanced postgres_fdw to understand the ps_numTuples if it is set. If and when remote ORDER BY is pushed down, the latest code tries to sort the entire remote table because it does not know how many rows to be returned. Thus, it took larger execution time. On the other hands, the patched one runs the remote query with LIMIT clause according to the ps_numTuples; which is informed by the Limit node on top of the ForeignScan node. * without patch ================= postgres=# explain (analyze,verbose) select * from ft order by x,y limit 10; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------- Limit (cost=100.00..100.43 rows=10 width=52) (actual time=2332.548..2332.550 rows=10 loops=1) Output: id, x, y, z -> Foreign Scan on public.ft (cost=100.00..146.46 rows=1077 width=52) (actual time=2332.547..2332.548 rows=10 loops=1) Output: id, x, y, z Remote SQL: SELECT id, x, y, z FROM public.t ORDER BY x ASC NULLS LAST, y ASC NULLS LAST Planning time: 0.177 ms Execution time: 2445.590 ms (7 rows) * with patch ============== postgres=# explain (analyze,verbose) select * from ft order by x,y limit 10; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------- Limit (cost=100.00..100.43 rows=10 width=52) (actual time=579.469..579.471 rows=10 loops=1) Output: id, x, y, z -> Foreign Scan on public.ft (cost=100.00..146.46 rows=1077 width=52) (actual time=579.468..579.469 rows=10 loops=1) Output: id, x, y, z Remote SQL: SELECT id, x, y, z FROM public.t ORDER BY x ASC NULLS LAST, y ASC NULLS LAST Planning time: 0.123 ms Execution time: 579.858 ms (7 rows) Right now, I have a few concerns for this patch. 1. Because LIMIT clause can have expression not only constant value, we cannot determine the value of ps_numTuples until execution time. So, it is not possible to adjust remote query on planning time, and EXPLAIN does not show exact remote query even if LIMIT clause was attached actually. 2. Where is the best location to put the interface contract to set ps_numTuples field. It has to be set prior to the first ExecProcNode() after ExecInitNode() or ExecReScan(). Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com> > -----Original Message----- > From: Robert Haas [mailto:robertmhaas@gmail.com] > Sent: Friday, September 16, 2016 12:39 AM > To: Kaigai Kouhei(海外 浩平) > Cc: Jeevan Chalke; pgsql-hackers@postgresql.org; Etsuro Fujita; Andres Freund > Subject: ##freemail## Re: [HACKERS] PassDownLimitBound for ForeignScan/CustomScan > > On Tue, Sep 13, 2016 at 9:07 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > > In the current implementation calls recompute_limits() on the first > > invocation of ExecLimit and ExecReScanLimit. Do we expect the > > ps->numTuples will be also passed down to the child nodes on the same > > timing? > > Sure, unless we find some reason why that's not good. > > > I also think this new executor contract shall be considered as a hint > > (but not a requirement) for the child nodes, because it allows the > > parent nodes to re-distribute the upper limit regardless of the type > > of the child nodes as long as the parent node can work correctly and > > has benefit even if the child node returns a part of tuples. It makes > > the decision whether the upper limit should be passed down much simple. > > The child node "can" ignore the hint but can utilize for more optimization. > > +1. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company
Attachment
pgsql-hackers by date: