[PATCH] Improve EXPLAIN ANALYZE overhead by sampling - Mailing list pgsql-patches
From | Martijn van Oosterhout |
---|---|
Subject | [PATCH] Improve EXPLAIN ANALYZE overhead by sampling |
Date | |
Msg-id | 20060509203704.GJ29652@svana.org Whole thread Raw |
Responses |
Re: [PATCH] Improve EXPLAIN ANALYZE overhead by sampling
Re: [PATCH] Improve EXPLAIN ANALYZE overhead by sampling Re: [PATCH] Improve EXPLAIN ANALYZE overhead by sampling |
List | pgsql-patches |
This was a suggestion made back in March that would dramatically reduce the overhead of EXPLAIN ANALYZE on queries that loop continuously over the same nodes. http://archives.postgresql.org/pgsql-hackers/2006-03/msg01114.php What it does behave normally for the first 50 tuples of any node, but after that it starts sampling at ever increasing intervals, the intervals controlled by an exponential function. So for a node iterating over 1 million tuples it takes around 15,000 samples. The result is that EXPLAIN ANALYZE has a much reduced effect on the total execution time. Without EXPLAIN ANALYZE: postgres=# select count(*) from generate_series(1,1000000); count --------- 1000000 (1 row) Time: 2303.599 ms EXPLAIN ANALYZE without patch: postgres=# explain analyze select count(*) from generate_series(1,1000000); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------ Aggregate (cost=15.00..15.01 rows=1 width=0) (actual time=8022.070..8022.073 rows=1 loops=1) -> Function Scan on generate_series (cost=0.00..12.50 rows=1000 width=0) (actual time=1381.762..4873.369 rows=1000000loops=1) Total runtime: 8042.472 ms (3 rows) Time: 8043.401 ms EXPLAIN ANALYZE with patch: postgres=# explain analyze select count(*) from generate_series(1,1000000); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------ Aggregate (cost=15.00..15.01 rows=1 width=0) (actual time=2469.491..2469.494 rows=1 loops=1) -> Function Scan on generate_series (cost=0.00..12.50 rows=1000 width=0) (actual time=1405.002..2187.417 rows=1000000loops=1) Total runtime: 2496.529 ms (3 rows) Time: 2497.488 ms As you can see, the overhead goes from 5.7 seconds to 0.2 seconds. Obviously this is an extreme case, but it will probably help in a lot of other cases people have been complaining about. - To get this close it needs to get an estimate of the sampling overhead. It does this by a little calibration loop that is run once per backend. If you don't do this, you end up assuming all tuples take the same time as tuples with the overhead, resulting in nodes apparently taking longer than their parent nodes. Incidently, I measured the overhead to be about 3.6us per tuple per node on my (admittedly slightly old) machine. Note that the resulting times still include the overhead actually incurred, I didn't filter it out. I want the times to remain reflecting reality as closely as possible. - I also removed InstrStopNodeMulti and made InstrStopNode take a tuple count parameter instead. This is much clearer all round. - I also didn't make it optional. I'm unsure about whether it should be optional or not, given the number of cases where it will make a difference to be very few. - The tuple counter for sampling restarts every loop. Thus a node that is called repeatedly only returning one value each time will still measure every tuple, though its parent node won't. We'll need some field testing to see if that remains a significant effect. - I don't let the user know anywhere how many samples it took. Is this something users should care about? Any comments? -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Attachment
pgsql-patches by date: