Re: [PATCH] Improve EXPLAIN ANALYZE overhead by sampling - Mailing list pgsql-patches
From | Bruce Momjian |
---|---|
Subject | Re: [PATCH] Improve EXPLAIN ANALYZE overhead by sampling |
Date | |
Msg-id | 200605301401.k4UE1nO08251@candle.pha.pa.us Whole thread Raw |
In response to | [PATCH] Improve EXPLAIN ANALYZE overhead by sampling (Martijn van Oosterhout <kleptog@svana.org>) |
Responses |
Re: [PATCH] Improve EXPLAIN ANALYZE overhead by sampling
|
List | pgsql-patches |
Patch applied. Thanks. --------------------------------------------------------------------------- Martijn van Oosterhout wrote: -- Start of PGP signed section. > This was a suggestion made back in March that would dramatically reduce > the overhead of EXPLAIN ANALYZE on queries that loop continuously over > the same nodes. > > http://archives.postgresql.org/pgsql-hackers/2006-03/msg01114.php > > What it does behave normally for the first 50 tuples of any node, but > after that it starts sampling at ever increasing intervals, the > intervals controlled by an exponential function. So for a node > iterating over 1 million tuples it takes around 15,000 samples. The > result is that EXPLAIN ANALYZE has a much reduced effect on the total > execution time. > > Without EXPLAIN ANALYZE: > > postgres=# select count(*) from generate_series(1,1000000); > count > --------- > 1000000 > (1 row) > > Time: 2303.599 ms > > EXPLAIN ANALYZE without patch: > > postgres=# explain analyze select count(*) from generate_series(1,1000000); > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------------------ > Aggregate (cost=15.00..15.01 rows=1 width=0) (actual time=8022.070..8022.073 rows=1 loops=1) > -> Function Scan on generate_series (cost=0.00..12.50 rows=1000 width=0) (actual time=1381.762..4873.369 rows=1000000loops=1) > Total runtime: 8042.472 ms > (3 rows) > > Time: 8043.401 ms > > EXPLAIN ANALYZE with patch: > > postgres=# explain analyze select count(*) from generate_series(1,1000000); > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------------------ > Aggregate (cost=15.00..15.01 rows=1 width=0) (actual time=2469.491..2469.494 rows=1 loops=1) > -> Function Scan on generate_series (cost=0.00..12.50 rows=1000 width=0) (actual time=1405.002..2187.417 rows=1000000loops=1) > Total runtime: 2496.529 ms > (3 rows) > > Time: 2497.488 ms > > As you can see, the overhead goes from 5.7 seconds to 0.2 seconds. > Obviously this is an extreme case, but it will probably help in a lot > of other cases people have been complaining about. > > - To get this close it needs to get an estimate of the sampling overhead. > It does this by a little calibration loop that is run once per backend. > If you don't do this, you end up assuming all tuples take the same time > as tuples with the overhead, resulting in nodes apparently taking > longer than their parent nodes. Incidently, I measured the overhead to > be about 3.6us per tuple per node on my (admittedly slightly old) > machine. > > Note that the resulting times still include the overhead actually > incurred, I didn't filter it out. I want the times to remain reflecting > reality as closely as possible. > > - I also removed InstrStopNodeMulti and made InstrStopNode take a tuple > count parameter instead. This is much clearer all round. > > - I also didn't make it optional. I'm unsure about whether it should be > optional or not, given the number of cases where it will make a > difference to be very few. > > - The tuple counter for sampling restarts every loop. Thus a node that is > called repeatedly only returning one value each time will still measure > every tuple, though its parent node won't. We'll need some field > testing to see if that remains a significant effect. > > - I don't let the user know anywhere how many samples it took. Is this > something users should care about? > > Any comments? > -- > Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > > From each according to his ability. To each according to his ability to litigate. [ Attachment, skipping... ] -- End of PGP section, PGP failed! -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
pgsql-patches by date: