Home > mailing lists

Re: Hardware upgrade for a high-traffic database - Mailing list pgsql-performance

From	Jason Coene
Subject	Re: Hardware upgrade for a high-traffic database
Date	August 11, 2004 21:29:10
Msg-id	200408120029.i7C0T30E002389@mail.gotfrag.com Whole thread Raw
In response to	Re: Hardware upgrade for a high-traffic database (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: Hardware upgrade for a high-traffic database Re: Hardware upgrade for a high-traffic database
List	pgsql-performance

Tree view

> You mean you are doing
>     SELECT ... WHERE userid = 42 ORDER BY timestamp DESC LIMIT 5;
> and hoping that separate indexes on userid and timestamp will get the
> job done?  They won't.  There are only two possible plans for this,
> neither very good: select all of user 42's posts and sort them, or
> scan timewise backwards through *all* posts looking for the last 5 from
> user 42.

Wow!  I did try the method you state below (including the WHERE restricted
column in the sort by, and creating a two-column index), and it did execute
much faster (even on odd userid's to avoid cached results as much as
possible).

We have a lot of:

SELECT whatever
    FROM ourtable
    WHERE field1 = X
    AND field2 = Y
    AND field3 = Z
    ORDER BY id DESC
    LIMIT 5

With indexes:

ourtable(id)
ourtable(field1, field2, field3)

Is it standard procedure with postgres to include any fields listed in WHERE
in the ORDER BY, and create a single index for only the ORDER BY fields (in
order of appearance, of course)?

>
> If you do this enough to justify a specialized index, I would suggest a
> two-column index on (userid, timestamp).  You will also need to tweak
> the query, because the planner is not quite smart enough to deduce that
> such an index is applicable to the given sort order:
>     SELECT ... WHERE userid = 42 ORDER BY userid DESC, timestamp DESC
> LIMIT 5;
> This should generate an index-scan-backwards plan that will execute nigh
> instantaneously, because it will only fetch the rows you really want.
>
> You might or might not be able to drop the separate indexes on userid
> and timestamp, depending on what other queries you might have that need
> them.  But you should be paying attention to what plans you are really
> getting (see EXPLAIN) rather than just assuming that some indexes chosen
> at random will do what you need.
>
>             regards, tom lane
>

We do many varied queries on nearly every table - our data is highly
relational, and we have a lot of indexes.  I thought the planner would pick
up the right index via constraints and not require them in ORDER BY...
EXPLAIN ANALYZE says that the indexes are being used, ala:

gf=# EXPLAIN ANALYZE SELECT id FROM comments WHERE userid = 51 ORDER BY
timestamp DESC LIMIT 5;
                                                                  QUERY PLAN
----------------------------------------------------------------------------
-------------------------------------------------------------------
 Limit  (cost=1608.43..1608.45 rows=5 width=8) (actual time=0.292..0.317
rows=5 loops=1)
   ->  Sort  (cost=1608.43..1609.45 rows=407 width=8) (actual
time=0.287..0.295 rows=5 loops=1)
         Sort Key: "timestamp"
         ->  Index Scan using comments_ix_userid on comments
(cost=0.00..1590.79 rows=407 width=8) (actual time=0.031..0.190 rows=35
loops=1)
               Index Cond: (userid = 51)
 Total runtime: 0.375 ms
(6 rows)

Is this the wrong procedure?  Your suggested syntax seems much more
efficient, but I don't quite understand exactly why, as PG is using our
existing indexes...

gf=# EXPLAIN ANALYZE SELECT id FROM comments WHERE userid = 51 ORDER BY
userid DESC, timestamp DESC LIMIT 5;

QUERY PLAN
----------------------------------------------------------------------------
----------------------------------------------------------------------------
----
 Limit  (cost=0.00..19.90 rows=5 width=12) (actual time=0.040..0.076 rows=5
loops=1)
   ->  Index Scan Backward using comments_ix_userid_timestamp on comments
(cost=0.00..1620.25 rows=407 width=12) (actual time=0.035..0.054 rows=5
loops=1)
         Index Cond: (userid = 51)
 Total runtime: 0.134 ms
(4 rows)

Note: This was done after adding an index on comments (userid, timestamp)

Regards,

Jason

pgsql-performance by date:

From: Josh Berkus
Date: 11 August 2004, 21:26:05
Subject: Re: Buld Insert and Index use.

From: Tom Lane
Date: 11 August 2004, 23:09:24
Subject: Re: Hardware upgrade for a high-traffic database

Re: Hardware upgrade for a high-traffic database - Mailing list pgsql-performance

Previous

Next