Re: Hardware upgrade for a high-traffic database - Mailing list pgsql-performance
From | Jason Coene |
---|---|
Subject | Re: Hardware upgrade for a high-traffic database |
Date | |
Msg-id | 200408120029.i7C0T30E002389@mail.gotfrag.com Whole thread Raw |
In response to | Re: Hardware upgrade for a high-traffic database (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Hardware upgrade for a high-traffic database
Re: Hardware upgrade for a high-traffic database |
List | pgsql-performance |
> You mean you are doing > SELECT ... WHERE userid = 42 ORDER BY timestamp DESC LIMIT 5; > and hoping that separate indexes on userid and timestamp will get the > job done? They won't. There are only two possible plans for this, > neither very good: select all of user 42's posts and sort them, or > scan timewise backwards through *all* posts looking for the last 5 from > user 42. Wow! I did try the method you state below (including the WHERE restricted column in the sort by, and creating a two-column index), and it did execute much faster (even on odd userid's to avoid cached results as much as possible). We have a lot of: SELECT whatever FROM ourtable WHERE field1 = X AND field2 = Y AND field3 = Z ORDER BY id DESC LIMIT 5 With indexes: ourtable(id) ourtable(field1, field2, field3) Is it standard procedure with postgres to include any fields listed in WHERE in the ORDER BY, and create a single index for only the ORDER BY fields (in order of appearance, of course)? > > If you do this enough to justify a specialized index, I would suggest a > two-column index on (userid, timestamp). You will also need to tweak > the query, because the planner is not quite smart enough to deduce that > such an index is applicable to the given sort order: > SELECT ... WHERE userid = 42 ORDER BY userid DESC, timestamp DESC > LIMIT 5; > This should generate an index-scan-backwards plan that will execute nigh > instantaneously, because it will only fetch the rows you really want. > > You might or might not be able to drop the separate indexes on userid > and timestamp, depending on what other queries you might have that need > them. But you should be paying attention to what plans you are really > getting (see EXPLAIN) rather than just assuming that some indexes chosen > at random will do what you need. > > regards, tom lane > We do many varied queries on nearly every table - our data is highly relational, and we have a lot of indexes. I thought the planner would pick up the right index via constraints and not require them in ORDER BY... EXPLAIN ANALYZE says that the indexes are being used, ala: gf=# EXPLAIN ANALYZE SELECT id FROM comments WHERE userid = 51 ORDER BY timestamp DESC LIMIT 5; QUERY PLAN ---------------------------------------------------------------------------- ------------------------------------------------------------------- Limit (cost=1608.43..1608.45 rows=5 width=8) (actual time=0.292..0.317 rows=5 loops=1) -> Sort (cost=1608.43..1609.45 rows=407 width=8) (actual time=0.287..0.295 rows=5 loops=1) Sort Key: "timestamp" -> Index Scan using comments_ix_userid on comments (cost=0.00..1590.79 rows=407 width=8) (actual time=0.031..0.190 rows=35 loops=1) Index Cond: (userid = 51) Total runtime: 0.375 ms (6 rows) Is this the wrong procedure? Your suggested syntax seems much more efficient, but I don't quite understand exactly why, as PG is using our existing indexes... gf=# EXPLAIN ANALYZE SELECT id FROM comments WHERE userid = 51 ORDER BY userid DESC, timestamp DESC LIMIT 5; QUERY PLAN ---------------------------------------------------------------------------- ---------------------------------------------------------------------------- ---- Limit (cost=0.00..19.90 rows=5 width=12) (actual time=0.040..0.076 rows=5 loops=1) -> Index Scan Backward using comments_ix_userid_timestamp on comments (cost=0.00..1620.25 rows=407 width=12) (actual time=0.035..0.054 rows=5 loops=1) Index Cond: (userid = 51) Total runtime: 0.134 ms (4 rows) Note: This was done after adding an index on comments (userid, timestamp) Regards, Jason
pgsql-performance by date: