Re: Sort and index - Mailing list pgsql-performance
From | Jim C. Nasby |
---|---|
Subject | Re: Sort and index |
Date | |
Msg-id | 20050423015404.GV58835@decibel.org Whole thread Raw |
In response to | Re: Sort and index ("Jim C. Nasby" <decibel@decibel.org>) |
Responses |
Re: Sort and index
|
List | pgsql-performance |
I've run some performance tests. The actual test case is at http://stats.distributed.net/~decibel/timing.sql, and the results are at http://stats.distributed.net/~decibel/timing.log. In a nutshell, doing an index scan appears to be about 2x faster than a sequential scan and a sort. Something else of interest is that going from 50M of sort memory to 3G sped the sort up by 900 seconds. If someone wants to record data about the effect of sort_mem on on-disk sorts somewhere (maybe in the docs?) I can run some more tests for that case. In any case, it's clear that the planner is making the wrong choice here. BTW, changing random_page_cost to 3 or 4 doesn't change the plan. On Tue, Apr 19, 2005 at 10:40:41PM -0500, Jim C. Nasby wrote: > On Tue, Apr 19, 2005 at 11:01:26PM -0400, Tom Lane wrote: > > "Jim C. Nasby" <decibel@decibel.org> writes: > > > Actually, the planner (at least in 7.4) isn't smart enough to consider > > > if the sort would fit in memory or not. > > > > Really? Have you read cost_sort()? > > > > It's certainly possible that the calculation is all wet, but to claim > > that the issue is not considered is just wrong. > > To be fair, no, I haven't looked at the code. This is based strictly on > anecdotal evidence on a 120M row table. I'm currently running a test to > see how an index scan compares to a seqscan. I also got the same results > when I added a where clause that would restrict it to about 7% of the > table. > > Actually, after running some tests (below), the plan cost does change > when I change sort_mem (it was originally 50000). > > stats=# \d email_contrib > Table "public.email_contrib" > Column | Type | Modifiers > ------------+---------+----------- > project_id | integer | not null > id | integer | not null > date | date | not null > team_id | integer | > work_units | bigint | not null > Indexes: > "email_contrib_pkey" primary key, btree (project_id, id, date) > "email_contrib__pk24" btree (id, date) WHERE (project_id = 24) > "email_contrib__pk25" btree (id, date) WHERE (project_id = 25) > "email_contrib__pk8" btree (id, date) WHERE (project_id = 8) > "email_contrib__project_date" btree (project_id, date) > Foreign-key constraints: > "fk_email_contrib__id" FOREIGN KEY (id) REFERENCES stats_participant(id) ON UPDATE CASCADE > "fk_email_contrib__team_id" FOREIGN KEY (team_id) REFERENCES stats_team(team) ON UPDATE CASCADE > > stats=# explain select * from email_contrib where project_id=8 order by project_id, id, date; > QUERY PLAN > -------------------------------------------------------------------------------- > Sort (cost=3613476.05..3635631.71 rows=8862263 width=24) > Sort Key: project_id, id, date > -> Seq Scan on email_contrib (cost=0.00..2471377.50 rows=8862263 width=24) > Filter: (project_id = 8) > (4 rows) > > stats=# explain select * from email_contrib order by project_id, id, date; > QUERY PLAN > ---------------------------------------------------------------------------------- > Sort (cost=25046060.83..25373484.33 rows=130969400 width=24) > Sort Key: project_id, id, date > -> Seq Scan on email_contrib (cost=0.00..2143954.00 rows=130969400 width=24) > (3 rows) > > stats=# select 8862263::float/130969400; > ?column? > -------------------- > 0.0676666687027657 > (1 row) > > stats=# explain select * from email_contrib where project_id=8 order by project_id, id, date; > QUERY PLAN > ----------------------------------------------------------------------------------------------------- > Index Scan using email_contrib_pkey on email_contrib (cost=0.00..6832005.57 rows=8862263 width=24) > Index Cond: (project_id = 8) > (2 rows) > > stats=# explain select * from email_contrib order by project_id, id, date; > QUERY PLAN > --------------------------------------------------------------------------------------------------------- > Index Scan using email_contrib_pkey on email_contrib (cost=0.00..100055905.62 rows=130969400 width=24) > (1 row) > > stats=# set enable_seqscan=on; > SET > stats=# set sort_mem=1000; > SET > stats=# explain select * from email_contrib order by project_id, id, date; > QUERY PLAN > ---------------------------------------------------------------------------------- > Sort (cost=28542316.63..28869740.13 rows=130969400 width=24) > Sort Key: project_id, id, date > -> Seq Scan on email_contrib (cost=0.00..2143954.00 rows=130969400 width=24) > (3 rows) > > stats=# > > -- > Jim C. Nasby, Database Consultant decibel@decibel.org > Give your computer some brain candy! www.distributed.net Team #1828 > > Windows: "Where do you want to go today?" > Linux: "Where do you want to go tomorrow?" > FreeBSD: "Are you guys coming, or what?" > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
pgsql-performance by date: