Running a query twice to ensure cached results. - Mailing list pgsql-hackers
| From | Ron Mayer |
|---|---|
| Subject | Running a query twice to ensure cached results. |
| Date | |
| Msg-id | 44887145.5030901@cheapcomplexdevices.com Whole thread Raw |
| In response to | Re: That EXPLAIN ANALYZE patch still needs work (Tom Lane <tgl@sss.pgh.pa.us>) |
| Responses |
Re: Running a query twice to ensure cached results.
Re: Running a query twice to ensure cached results. |
| List | pgsql-hackers |
Tom Lane wrote:
> -- do it again to ensure fully cached
> bench=# select count(*) from accounts;
Short summary:
Does running a query only twice really insure that a result is cached? It seems not to be the case for seq-scans on
Linux.
I think this may matters to the discussions about a readahead thread/process that come up on this list that come up
hereoccasionally.
Experimental results here suggest that for larger tables Linux seems
to detect a seq-scan and not bother caching. It's very reproducible
for me here to do a reboot and not see the full speedup on a seq_scan
until the third time I run a query.su An example shown below [1] shows
that the third run of a query is faster than the second run. The
output of a 'vmstat 5' [2] while these queries was happening agrees
that significant I/O was still happening on the second run, but
no I/O happened the third time. The table comfortably fits in
memory (700MB table on a 2GB ram machine) and the machine was
otherwise idle so noone else wiped out the cache between the
first and second runs.
Why do I think this is worth mentioning here? * I think it impacts the occasional thread about wanting to include
logicin postgresql for readahead [3] or for the threads suggesting hinting to the the OS though madvise or
similarto avoid caching seq-scans. It seems that the Linux is detecting and at least somewhat reacting to seq
scanseven with no hinting. Anything added to postgresql might end up being a duplicated effort. I think Bruce
suggestedthat Solaris does this free-behind automatically [4], but this is the first I've noticed that Linux
seemsto do similar.
* I think it matters to people who post explain analyze twice without running it so often they get stable results.
(I note that this was not a problem for Tom since the timing of his first and second runs were the same so I
assumehe was just saying that he observed that the query was cached rather than that the first run forced the
secondrun to be cached.)
Ron
=========================================================================
== [note 1] the repeated queries showing the speedup after 3 runs.
== Running the same select count(*) 4 times after a clean reboot.
== Seems the OS's caching logic decided that the first seq_scan
== wasn't 'interesting' enough
=========================================================================
fli=# select count(*) from facets_s; count
---------- 15976558
(1 row)
Time: 29788.047 ms
fli=# select count(*) from facets_s; count
---------- 15976558
(1 row)
Time: 19344.573 ms
fli=# select count(*) from facets_s; count
---------- 15976558
(1 row)
Time: 13411.272 ms
fli=# select count(*) from facets_s; count
---------- 15976558
(1 row)
Time: 13107.856 ms
################################################################################
# [note 2] vmstat 5 while the above queries were being run
################################################################################
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si
so bi bo in cs us sy id wa 1 1 140 62140 71256 713360 0 0 47 31 92 84 7 1 92 0
*** the first time 1 0 140 50860 31912 808304 0 2 25215 29 1147 2612 49 15 0 36 1 0 360 54420
11112855240 0 0 23934 7 1139 2553 47 14 0 39 0 1 360 54008 11100 878708 0 0 23704 25 1149
246746 12 0 41 0 1 360 52512 11140 896592 0 0 24062 6 1135 2460 47 11 0 41
*** the second time 0 0 360 52688 11172 906916 0 0 13357 19 1085 1989 31 7 38 24 1 0 360 53976
11076912540 0 44 14273 57 1113 2102 32 7 29 32 2 0 360 54788 10908 923788 0 0 24509 54 1171
247446 12 0 42 1 0 360 54944 3096 939948 0 0 11180 39 1093 1976 65 13 0 22
*** the third time 3 0 360 54280 3872 940508 0 0 264 14 1041 1560 85 15 0 0 1 0 360 53852
3904940940 0 0 88 29 1022 1505 53 9 36 2 2 0 360 51616 4052 943068 0 0 443 54 1037
155282 15 0 4 1 0 360 51488 4060 943180 0 0 22 2 1013 1522 84 16 0 0
#############
[3] http://archives.postgresql.org/pgsql-hackers/2005-11/msg01449.php
[4] http://archives.postgresql.org/pgsql-performance/2003-10/msg00188.php
pgsql-hackers by date: