Home > mailing lists
Re: pgsql: Support partition pruning at execution time - Mailing list pgsql-hackers

From	Alvaro Herrera
Subject	Re: pgsql: Support partition pruning at execution time
Date	April 10, 2018 21:32:47
Msg-id	20180410183247.ioo3ijpwfbbbq26u@alvherre.pgsql Whole thread Raw
In response to	Re: pgsql: Support partition pruning at execution time (Robert Haas <robertmhaas@gmail.com>)
List	pgsql-hackers
Tree view
Robert Haas wrote:
> 
> I don't have a strong opinion on that.  Since we know how many tuples
> were processed by each worker, knowing how many heap fetches we have
> on a per-worker basis seems like a good thing to have, too.  On the
> other hand, maybe EXPLAIN (ANALYZE, VERBOSE) would give us that
> anyway, since it knows about displaying per-worker instrumentation
> (look for /* Show worker detail */).  If it doesn't, then that's
> probably bad, because I'm pretty sure that when I installed that code
> the stuff that got displayed for worker instrumentation pretty much
> matched the stuff that got displayed for overall instrumentation.

So, after some experimentation I was able to produce this output with
the previously proposed patch:
                                                                            QUERY PLAN                         
                                                    

───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 Gather Merge  (cost=1000.51..27083147.02 rows=22950230 width=12) (actual time=1198.906..1198.906 rows=0 loops=1)
   Output: tbl1.col1, tprt_1.col1, tprt_1.col1
   Workers Planned: 5
   Workers Launched: 5
   Buffers: shared hit=267516
   ->  Nested Loop  (cost=0.43..24318369.79 rows=4590046 width=12) (actual time=1183.291..1183.291 rows=0 loops=6)
         Output: tbl1.col1, tprt_1.col1, tprt_1.col1
         Join Filter: (tbl1.col1 = tprt_1.col1)
         Rows Removed by Join Filter: 1200012
         Buffers: shared hit=1836184
         Worker 0: actual time=1174.526..1174.526 rows=0 loops=1
           Buffers: shared hit=245599
         Worker 1: actual time=1178.845..1178.845 rows=0 loops=1
           Buffers: shared hit=270694
         Worker 2: actual time=1185.703..1185.703 rows=0 loops=1
           Buffers: shared hit=313607
         Worker 3: actual time=1189.595..1189.595 rows=0 loops=1
           Buffers: shared hit=383714
         Worker 4: actual time=1176.965..1176.965 rows=0 loops=1
           Buffers: shared hit=355054
         ->  Parallel Index Only Scan using tprt1_idx on public.tprt_1  (cost=0.43..63100.29 rows=360004 width=4)
(actualtime=0.041..209.113 rows=300003 loops=6)
 
               Output: tprt_1.col1
               Heap Fetches: 1800018
               Buffers: shared hit=36166
               Worker 0: actual time=0.035..192.255 rows=240750 loops=1
                 Buffers: shared hit=4849
               Worker 1: actual time=0.031..184.332 rows=265307 loops=1
                 Buffers: shared hit=5387
               Worker 2: actual time=0.046..272.412 rows=307509 loops=1
                 Buffers: shared hit=6098
               Worker 3: actual time=0.035..212.866 rows=376064 loops=1
                 Buffers: shared hit=7650
               Worker 4: actual time=0.036..201.076 rows=348048 loops=1
                 Buffers: shared hit=7006
         ->  Seq Scan on public.tbl1  (cost=0.00..35.50 rows=2550 width=4) (actual time=0.001..0.002 rows=4 loo
Time: 1205,533 ms (00:01,206)

You're right that per-worker heap-fetches would be probably better, but
that seems like a project larger than I'm willing to tackle at this
point in the devel cycle -- mainly because that "worker detail" does not
have node-specific handling, so there's a bunch of new code to be
written there.  This amount of detail seems about right for me: I'm not
sure that you really care all that much about how many fetches each
worker had to do, but rather you care about how many there were in the
table as a whole.

Another query (without verbose) gives this:

                                                                         QUERY PLAN
                                   
 

────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 Gather  (cost=20000001002.10..20249583898.07 rows=80390828 width=12) (actual time=1554.752..6633.272 rows=3600036
loops=1)
   Workers Planned: 1
   Workers Launched: 1
   Single Copy: true
   ->  Nested Loop  (cost=20000000002.10..20241543815.27 rows=80390828 width=12) (actual time=1547.552..5832.355
rows=3600036loops=1)
 
         Join Filter: (tbl1.col1 = tprt_1.col1)
         Rows Removed by Join Filter: 21600216
         Buffers: shared hit=75688 read=54346
         ->  Merge Append  (cost=2.10..371288.65 rows=6305163 width=4) (actual time=0.758..1980.024 rows=6300063
loops=1)
               Sort Key: tprt_1.col1
               Buffers: shared hit=75688 read=54345
               ->  Index Only Scan using tprt1_idx on tprt_1  (cost=0.43..77500.44 rows=1800018 width=4) (actual
time=0.211..387.916rows=1800018 loops=1)
 
                     Heap Fetches: 1800018
                     Buffers: shared hit=22028 read=14035
               ->  Index Only Scan using tprt2_idx on tprt_2  (cost=0.43..121147.34 rows=2700027 width=4) (actual
time=0.180..628.350rows=2700027 loops=1)
 
                     Heap Fetches: 2700027
                     Buffers: shared hit=40597 read=26251
               ->  Index Only Scan using tprt3_idx on tprt_3  (cost=0.42..29722.56 rows=900009 width=4) (actual
time=0.153..184.429rows=900009 loops=1)
 
                     Heap Fetches: 900009
                     Buffers: shared hit=6538 read=7029
               ->  Index Only Scan using tprt4_idx on tprt_4  (cost=0.15..82.41 rows=2550 width=4) (actual
time=0.023..0.023rows=0 loops=1)
 
                     Heap Fetches: 0
                     Buffers: shared hit=1
               ->  Index Only Scan using tprt5_idx on tprt_5  (cost=0.15..82.41 rows=2550 width=4) (actual
time=0.018..0.018rows=0 loops=1)
 
                     Heap Fetches: 0
                     Buffers: shared hit=1
               ->  Index Only Scan using tprt6_idx on tprt_6  (cost=0.42..29734.56 rows=900009 width=4) (actual
time=0.165..191.343rows=900009 loops=1)
 
                     Heap Fetches: 900009
                     Buffers: shared hit=6523 read=7030
         ->  Materialize  (cost=10000000000.00..10000000048.25 rows=2550 width=4) (actual time=0.000..0.000 rows=4
loops=6300063)
               Buffers: shared read=1
               ->  Seq Scan on tbl1  (cost=10000000000.00..10000000035.50 rows=2550 width=4) (actual time=0.033..0.036
rows=4loops=1)
 
                     Buffers: shared read=1
 Planning Time: 4.344 ms
 Execution Time: 7008.095 ms
(35 filas)

Looks alright to me.

> In any case part of the point is that Instrumentation is supposed to
> be a generic structure that contains things that are for the most part
> common to all node types.  So what MERGE did to that structure looks
> like in inadvisable kludge to me.  I'd get rid of that and do it a
> different way rather than propagate the idea that nfilteredX is
> scratch space that can mean something different in every separate
> node.

Agreed.  My proposed patch adds another generic counter "tuples2", which
can be used at least by ON CONFLICT.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date:
From: Robert Haas
Date: 10 April 2018, 21:21:06
Subject: Re: Reopen logfile on SIGHUP
From: David Steele
Date: 10 April 2018, 21:41:45
Subject: Re: 2018-03 CF Cleanup
Re: pgsql: Support partition pruning at execution time - Mailing list pgsql-hackers

Previous

Next