Running a query twice to ensure cached results. - Mailing list pgsql-hackers
From | Ron Mayer |
---|---|
Subject | Running a query twice to ensure cached results. |
Date | |
Msg-id | 44887145.5030901@cheapcomplexdevices.com Whole thread Raw |
In response to | Re: That EXPLAIN ANALYZE patch still needs work (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Running a query twice to ensure cached results.
Re: Running a query twice to ensure cached results. |
List | pgsql-hackers |
Tom Lane wrote: > -- do it again to ensure fully cached > bench=# select count(*) from accounts; Short summary: Does running a query only twice really insure that a result is cached? It seems not to be the case for seq-scans on Linux. I think this may matters to the discussions about a readahead thread/process that come up on this list that come up hereoccasionally. Experimental results here suggest that for larger tables Linux seems to detect a seq-scan and not bother caching. It's very reproducible for me here to do a reboot and not see the full speedup on a seq_scan until the third time I run a query.su An example shown below [1] shows that the third run of a query is faster than the second run. The output of a 'vmstat 5' [2] while these queries was happening agrees that significant I/O was still happening on the second run, but no I/O happened the third time. The table comfortably fits in memory (700MB table on a 2GB ram machine) and the machine was otherwise idle so noone else wiped out the cache between the first and second runs. Why do I think this is worth mentioning here? * I think it impacts the occasional thread about wanting to include logicin postgresql for readahead [3] or for the threads suggesting hinting to the the OS though madvise or similarto avoid caching seq-scans. It seems that the Linux is detecting and at least somewhat reacting to seq scanseven with no hinting. Anything added to postgresql might end up being a duplicated effort. I think Bruce suggestedthat Solaris does this free-behind automatically [4], but this is the first I've noticed that Linux seemsto do similar. * I think it matters to people who post explain analyze twice without running it so often they get stable results. (I note that this was not a problem for Tom since the timing of his first and second runs were the same so I assumehe was just saying that he observed that the query was cached rather than that the first run forced the secondrun to be cached.) Ron ========================================================================= == [note 1] the repeated queries showing the speedup after 3 runs. == Running the same select count(*) 4 times after a clean reboot. == Seems the OS's caching logic decided that the first seq_scan == wasn't 'interesting' enough ========================================================================= fli=# select count(*) from facets_s; count ---------- 15976558 (1 row) Time: 29788.047 ms fli=# select count(*) from facets_s; count ---------- 15976558 (1 row) Time: 19344.573 ms fli=# select count(*) from facets_s; count ---------- 15976558 (1 row) Time: 13411.272 ms fli=# select count(*) from facets_s; count ---------- 15976558 (1 row) Time: 13107.856 ms ################################################################################ # [note 2] vmstat 5 while the above queries were being run ################################################################################ procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 1 1 140 62140 71256 713360 0 0 47 31 92 84 7 1 92 0 *** the first time 1 0 140 50860 31912 808304 0 2 25215 29 1147 2612 49 15 0 36 1 0 360 54420 11112855240 0 0 23934 7 1139 2553 47 14 0 39 0 1 360 54008 11100 878708 0 0 23704 25 1149 246746 12 0 41 0 1 360 52512 11140 896592 0 0 24062 6 1135 2460 47 11 0 41 *** the second time 0 0 360 52688 11172 906916 0 0 13357 19 1085 1989 31 7 38 24 1 0 360 53976 11076912540 0 44 14273 57 1113 2102 32 7 29 32 2 0 360 54788 10908 923788 0 0 24509 54 1171 247446 12 0 42 1 0 360 54944 3096 939948 0 0 11180 39 1093 1976 65 13 0 22 *** the third time 3 0 360 54280 3872 940508 0 0 264 14 1041 1560 85 15 0 0 1 0 360 53852 3904940940 0 0 88 29 1022 1505 53 9 36 2 2 0 360 51616 4052 943068 0 0 443 54 1037 155282 15 0 4 1 0 360 51488 4060 943180 0 0 22 2 1013 1522 84 16 0 0 ############# [3] http://archives.postgresql.org/pgsql-hackers/2005-11/msg01449.php [4] http://archives.postgresql.org/pgsql-performance/2003-10/msg00188.php
pgsql-hackers by date: