Re: SeqScan costs - Mailing list pgsql-hackers
| From | Decibel! |
|---|---|
| Subject | Re: SeqScan costs |
| Date | |
| Msg-id | EB312F7D-78CC-4034-9871-AC0A419238D5@decibel.org Whole thread Raw |
| In response to | Re: SeqScan costs (Andrew Gierth <andrew@tao11.riddles.org.uk>) |
| List | pgsql-hackers |
On Aug 12, 2008, at 4:52 PM, Andrew Gierth wrote:
>>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
>
>>> Proposal: Make the first block of a seq scan cost
>>> random_page_cost, then after that every additional block costs
>>> seq_page_cost.
>
> ?Tom> This is only going to matter for a table of 1 block (or at least
> ?Tom> very few blocks), and for such a table it's highly likely that
> ?Tom> it's in RAM anyway.? So I'm unconvinced that the proposed change
> ?Tom> represents a better model of reality.
>
> Simple example which demonstrates a 10x speed improvement for index
> scan over seqscan for a 1-block table (on 8.3.3):
>
> create table oneblock (id integer primary key, value text not null);?
> insert into oneblock select i, 'row ' || i from generate_series
> (1,200) i;
>
> test=> select pg_relation_size('oneblock');
> ?pg_relation_size?
> ------------------
> ?? ? ? ? ? ? 8192
>
> analyze oneblock;
>
> set enable_seqscan=true;
>
> select (select value from oneblock where id = i)
> ? from generate_series(1,200) i, generate_series(1,5000) j;
> Time: 25596.709 ms? (that's 25.6 us per row)
>
> set enable_seqscan=false;
>
> select (select value from oneblock where id = i)
> ? from generate_series(1,200) i, generate_series(1,5000) j;
> Time: 2415.691 ms ? (that's 2.4 us per row)
Roughly what I get on my MBP (I'm about a factor of 2 slower). This
makes me think it's an issue of having to slog through an entire page
one row at a time vs just using the index. To test this I tested
selecting i=200 (remember we start filling data at the back of the
page, so 200 would actually be the front, and I'm assuming the first
value that would be hit) vs i=1. With seqscans, I saw about a 10%
difference. With index scans the difference was moot, but also note
that now index scans are in-between seqscans in performance.
decibel@platter.local=# explain analyze select (select value from
oneblock where id = 200) from generate_series(1,200) i, generate_series(1,500000) j;
QUERY
PLAN
------------------------------------------------------------------------
----------------------------------------------------------------- Nested Loop (cost=17.00..20029.50 rows=1000000
width=0)(actual
time=270.867..65821.373 rows=100000000 loops=1) InitPlan -> Seq Scan on oneblock (cost=0.00..3.50 rows=1
width=7)
(actual time=0.052..0.053 rows=1 loops=1) Filter: (id = 200) -> Function Scan on generate_series i
(cost=0.00..12.50
rows=1000 width=0) (actual time=0.062..0.351 rows=200 loops=1) -> Materialize (cost=13.50..23.50 rows=1000 width=0)
(actual
time=1.368..164.634 rows=500000 loops=200) -> Function Scan on generate_series j (cost=0.00..12.50
rows=1000 width=0) (actual time=270.743..459.335 rows=500000 loops=1) Total runtime: 79055.822 ms
(8 rows)
decibel@platter.local=# explain analyze select (select value from
oneblock where id = 1) from generate_series(1,200) i, generate_series(1,500000) j;
QUERY
PLAN
------------------------------------------------------------------------
----------------------------------------------------------------- Nested Loop (cost=17.00..20029.50 rows=1000000
width=0)(actual
time=261.941..72937.226 rows=100000000 loops=1) InitPlan -> Seq Scan on oneblock (cost=0.00..3.50 rows=1
width=7)
(actual time=0.025..0.056 rows=1 loops=1) Filter: (id = 1) -> Function Scan on generate_series i
(cost=0.00..12.50
rows=1000 width=0) (actual time=0.060..0.346 rows=200 loops=1) -> Materialize (cost=13.50..23.50 rows=1000 width=0)
(actual
time=1.375..182.474 rows=500000 loops=200) -> Function Scan on generate_series j (cost=0.00..12.50
rows=1000 width=0) (actual time=261.815..448.652 rows=500000 loops=1) Total runtime: 87702.315 ms
(8 rows)
decibel@platter.local=# set enable_seqscan =off;
SET
decibel@platter.local=# explain analyze select (select value from
oneblock where id = 200) from generate_series(1,200) i, generate_series(1,500000) j;
QUERY
PLAN
------------------------------------------------------------------------
----------------------------------------------------------------- Nested Loop (cost=21.77..20034.27 rows=1000000
width=0)(actual
time=262.219..69851.786 rows=100000000 loops=1) InitPlan -> Index Scan using oneblock_pkey on oneblock
(cost=0.00..8.27 rows=1 width=7) (actual time=0.024..0.026 rows=1
loops=1) Index Cond: (id = 200) -> Function Scan on generate_series i (cost=0.00..12.50
rows=1000 width=0) (actual time=0.062..0.355 rows=200 loops=1) -> Materialize (cost=13.50..23.50 rows=1000 width=0)
(actual
time=1.325..174.314 rows=500000 loops=200) -> Function Scan on generate_series j (cost=0.00..12.50
rows=1000 width=0) (actual time=262.119..449.383 rows=500000 loops=1) Total runtime: 83024.952 ms
(8 rows)
decibel@platter.local=# explain analyze select (select value from
oneblock where id = 1) from generate_series(1,200) i, generate_series(1,500000) j;
QUERY
PLAN
------------------------------------------------------------------------
----------------------------------------------------------------- Nested Loop (cost=21.77..20034.27 rows=1000000
width=0)(actual
time=262.175..68943.985 rows=100000000 loops=1) InitPlan -> Index Scan using oneblock_pkey on oneblock
(cost=0.00..8.27 rows=1 width=7) (actual time=0.023..0.025 rows=1
loops=1) Index Cond: (id = 1) -> Function Scan on generate_series i (cost=0.00..12.50
rows=1000 width=0) (actual time=0.062..0.339 rows=200 loops=1) -> Materialize (cost=13.50..23.50 rows=1000 width=0)
(actual
time=1.325..176.056 rows=500000 loops=200) -> Function Scan on generate_series j (cost=0.00..12.50
rows=1000 width=0) (actual time=262.079..454.692 rows=500000 loops=1) Total runtime: 82598.556 ms
(8 rows)
decibel@platter.local=#
--
Decibel!, aka Jim C. Nasby, Database Architect decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828
pgsql-hackers by date: