Re: [PoC] pgstattuple2: block sampling to reduce physical read - Mailing list pgsql-hackers
From | Satoshi Nagayasu |
---|---|
Subject | Re: [PoC] pgstattuple2: block sampling to reduce physical read |
Date | |
Msg-id | 525779C5.2020608@uptime.jp Whole thread Raw |
In response to | Re: [PoC] pgstattuple2: block sampling to reduce physical read (Mark Kirkwood <mark.kirkwood@catalyst.net.nz>) |
Responses |
Re: [PoC] pgstattuple2: block sampling to reduce physical
read
|
List | pgsql-hackers |
(2013/10/11 7:32), Mark Kirkwood wrote: > On 11/10/13 11:09, Mark Kirkwood wrote: >> On 16/09/13 16:20, Satoshi Nagayasu wrote: >>> (2013/09/15 11:07), Peter Eisentraut wrote: >>>> On Sat, 2013-09-14 at 16:18 +0900, Satoshi Nagayasu wrote: >>>>> I'm looking forward to seeing more feedback on this approach, >>>>> in terms of design and performance improvement. >>>>> So, I have submitted this for the next CF. >>>> Your patch fails to build: >>>> >>>> pgstattuple.c: In function ‘pgstat_heap_sample’: >>>> pgstattuple.c:737:13: error: ‘SnapshotNow’ undeclared (first use in >>>> this function) >>>> pgstattuple.c:737:13: note: each undeclared identifier is reported >>>> only once for each function it appears in >>> Thanks for checking. Fixed to eliminate SnapshotNow. >>> >> This seems like a cool idea! I took a quick look, and initally >> replicated the sort of improvement you saw: >> >> >> bench=# explain analyze select * from pgstattuple('pgbench_accounts'); >> QUERY PLAN >> >> -------------------------------------------------------------------------------- >> Function Scan on pgstattuple (cost=0.00..0.01 rows=1 width=72) (actual >> time=786.368..786.369 rows=1 loops=1) >> Total runtime: 786.384 ms >> (2 rows) >> >> bench=# explain analyze select * from pgstattuple2('pgbench_accounts'); >> NOTICE: pgstattuple2: SE tuple_count 0.00, tuple_len 0.00, >> dead_tuple_count 0.00, dead_tuple_len 0.00, free_space 0.00 >> QUERY PLAN >> >> -------------------------------------------------------------------------------- >> Function Scan on pgstattuple2 (cost=0.00..0.01 rows=1 width=72) (actual >> time=12.004..12.005 rows=1 loops=1) >> Total runtime: 12.019 ms >> (2 rows) >> >> >> >> I wondered what sort of difference eliminating caching would make: >> >> $ sudo sysctl -w vm.drop_caches=3 >> >> Repeating the above queries: >> >> >> bench=# explain analyze select * from pgstattuple('pgbench_accounts'); >> QUERY PLAN >> >> -------------------------------------------------------------------------------- >> Function Scan on pgstattuple (cost=0.00..0.01 rows=1 width=72) (actual >> time=9503.774..9503.776 rows=1 loops=1) >> Total runtime: 9504.523 ms >> (2 rows) >> >> bench=# explain analyze select * from pgstattuple2('pgbench_accounts'); >> NOTICE: pgstattuple2: SE tuple_count 0.00, tuple_len 0.00, >> dead_tuple_count 0.00, dead_tuple_len 0.00, free_space 0.00 >> QUERY PLAN >> >> -------------------------------------------------------------------------------- >> Function Scan on pgstattuple2 (cost=0.00..0.01 rows=1 width=72) (actual >> time=12330.630..12330.631 rows=1 loops=1) >> Total runtime: 12331.353 ms >> (2 rows) >> >> >> So the sampling code seems *slower* when the cache is completely cold - >> is that expected? (I have not looked at how the code works yet - I'll >> dive in later if I get a chance)! Thanks for testing that. It would be very helpful to improve the performance. > Quietly replying to myself - looking at the code the sampler does 3000 > random page reads... I guess this is slower than 163935 (number of pages > in pgbench_accounts) sequential page reads thanks to os readahead on my > type of disk (WD Velociraptor). Tweaking the number of random reads (i.e > the sample size) down helps - but obviously that can impact estimation > accuracy. > > Thinking about this a bit more, I guess the elapsed runtime is not the > *only* theng to consider - the sampling code will cause way less > disruption to the os page cache (3000 pages vs possibly lots more than > 3000 for reading an entire ralation). > > Thoughts? I think it could be improved by sorting sample block numbers *before* physical block reads in order to eliminate random access on the disk. pseudo code: -------------------------------------- for (i=0 ; i<SAMPLE_SIZE ; i++) { sample_block[i] = random(); } qsort(sample_block); for (i=0 ; i<SAMPLE_SIZE ; i++) { buf = ReadBuffer(rel, sample_block[i]); do_some_stats_stuff(buf); } -------------------------------------- I guess it would be helpful for reducing random access thing. Any comments? -- Satoshi Nagayasu <snaga@uptime.jp> Uptime Technologies, LLC. http://www.uptime.jp
pgsql-hackers by date: