Home > mailing lists

Re: Sort and index - Mailing list pgsql-performance

From	Jim C. Nasby
Subject	Re: Sort and index
Date	April 22, 2005 22:54:13
Msg-id	20050423015404.GV58835@decibel.org Whole thread Raw
In response to	Re: Sort and index ("Jim C. Nasby" <decibel@decibel.org>)
Responses	Re: Sort and index
List	pgsql-performance

Tree view

I've run some performance tests. The actual test case is at
http://stats.distributed.net/~decibel/timing.sql, and the results are at
http://stats.distributed.net/~decibel/timing.log. In a nutshell, doing
an index scan appears to be about 2x faster than a sequential scan and a
sort.

Something else of interest is that going from 50M of sort memory to 3G
sped the sort up by 900 seconds. If someone wants to record data about
the effect of sort_mem on on-disk sorts somewhere (maybe in the docs?) I
can run some more tests for that case.

In any case, it's clear that the planner is making the wrong choice
here. BTW, changing random_page_cost to 3 or 4 doesn't change the plan.

On Tue, Apr 19, 2005 at 10:40:41PM -0500, Jim C. Nasby wrote:
> On Tue, Apr 19, 2005 at 11:01:26PM -0400, Tom Lane wrote:
> > "Jim C. Nasby" <decibel@decibel.org> writes:
> > > Actually, the planner (at least in 7.4) isn't smart enough to consider
> > > if the sort would fit in memory or not.
> >
> > Really?  Have you read cost_sort()?
> >
> > It's certainly possible that the calculation is all wet, but to claim
> > that the issue is not considered is just wrong.
>
> To be fair, no, I haven't looked at the code. This is based strictly on
> anecdotal evidence on a 120M row table. I'm currently running a test to
> see how an index scan compares to a seqscan. I also got the same results
> when I added a where clause that would restrict it to about 7% of the
> table.
>
> Actually, after running some tests (below), the plan cost does change
> when I change sort_mem (it was originally 50000).
>
> stats=# \d email_contrib
>    Table "public.email_contrib"
>    Column   |  Type   | Modifiers
> ------------+---------+-----------
>  project_id | integer | not null
>  id         | integer | not null
>  date       | date    | not null
>  team_id    | integer |
>  work_units | bigint  | not null
> Indexes:
>     "email_contrib_pkey" primary key, btree (project_id, id, date)
>     "email_contrib__pk24" btree (id, date) WHERE (project_id = 24)
>     "email_contrib__pk25" btree (id, date) WHERE (project_id = 25)
>     "email_contrib__pk8" btree (id, date) WHERE (project_id = 8)
>     "email_contrib__project_date" btree (project_id, date)
> Foreign-key constraints:
>     "fk_email_contrib__id" FOREIGN KEY (id) REFERENCES stats_participant(id) ON UPDATE CASCADE
>     "fk_email_contrib__team_id" FOREIGN KEY (team_id) REFERENCES stats_team(team) ON UPDATE CASCADE
>
> stats=# explain select * from email_contrib where project_id=8 order by project_id, id, date;
>                                    QUERY PLAN
> --------------------------------------------------------------------------------
>  Sort  (cost=3613476.05..3635631.71 rows=8862263 width=24)
>    Sort Key: project_id, id, date
>    ->  Seq Scan on email_contrib  (cost=0.00..2471377.50 rows=8862263 width=24)
>          Filter: (project_id = 8)
> (4 rows)
>
> stats=# explain select * from email_contrib order by project_id, id, date;
>                                     QUERY PLAN
> ----------------------------------------------------------------------------------
>  Sort  (cost=25046060.83..25373484.33 rows=130969400 width=24)
>    Sort Key: project_id, id, date
>    ->  Seq Scan on email_contrib  (cost=0.00..2143954.00 rows=130969400 width=24)
> (3 rows)
>
> stats=# select 8862263::float/130969400;
>       ?column?
> --------------------
>  0.0676666687027657
> (1 row)
>
> stats=# explain select * from email_contrib where project_id=8 order by project_id, id, date;
>                                              QUERY PLAN
> -----------------------------------------------------------------------------------------------------
>  Index Scan using email_contrib_pkey on email_contrib  (cost=0.00..6832005.57 rows=8862263 width=24)
>    Index Cond: (project_id = 8)
> (2 rows)
>
> stats=# explain select * from email_contrib order by project_id, id, date;
>                                                QUERY PLAN
> ---------------------------------------------------------------------------------------------------------
>  Index Scan using email_contrib_pkey on email_contrib  (cost=0.00..100055905.62 rows=130969400 width=24)
> (1 row)
>
> stats=# set enable_seqscan=on;
> SET
> stats=# set sort_mem=1000;
> SET
> stats=# explain select * from email_contrib order by project_id, id, date;
>                                     QUERY PLAN
> ----------------------------------------------------------------------------------
>  Sort  (cost=28542316.63..28869740.13 rows=130969400 width=24)
>    Sort Key: project_id, id, date
>    ->  Seq Scan on email_contrib  (cost=0.00..2143954.00 rows=130969400 width=24)
> (3 rows)
>
> stats=#
>
> --
> Jim C. Nasby, Database Consultant               decibel@decibel.org
> Give your computer some brain candy! www.distributed.net Team #1828
>
> Windows: "Where do you want to go today?"
> Linux: "Where do you want to go tomorrow?"
> FreeBSD: "Are you guys coming, or what?"
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>

--
Jim C. Nasby, Database Consultant               decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

pgsql-performance by date:

From: Josh Berkus
Date: 22 April 2005, 19:28:25
Subject: Re: Updating table, precautions?

From: "Jim C. Nasby"
Date: 22 April 2005, 23:01:56
Subject: Re: Joel's Performance Issues WAS : Opteron vs Xeon

Re: Sort and index - Mailing list pgsql-performance

Previous

Next