Thread: Optimizer: ranges and partial indices? Or use partitioning?
Heyho! Given a (big [1]) table values ( ts timestamp, source integer, value float ); [under what conditions] will the opitmizer be smart enough to make use of a partial index on "(source, ts) where ts > '2009-01-01'"? (Queries will have a date restriction but not necessarily the exact "> 2009-01-01".) (A full index on source, ts is also built, but most queries are on values within the last year.) And related: what are the pro / contra of such a partial index versus table partitioning? Partitioning certainly brings added complexity; since values are only ever appended to the table (with ts representing more or less "now"), data is physically already grouped by ts. Would we still benefit from partitioning? (partitioning by ts, that is. Partitioning by source is not realistic since there may be tens of thousands of sources. Queries will often be for values from multiple sources, so partitioning by groups of sources would be very complicated to implement.) thanks for your input. -- vbi [1] some of our databases are in the range of 20 to 50G, most of it in this one table. -- featured link: http://www.pool.ntp.org
Adrian von Bidder <avbidder@fortytwo.ch> writes: > Given a (big [1]) table > values ( ts timestamp, source integer, value float ); > [under what conditions] will the opitmizer be smart enough to make use of a > partial index on "(source, ts) where ts > '2009-01-01'"? (Queries will have > a date restriction but not necessarily the exact "> 2009-01-01".) The planner is reasonably smart about deductions involving combinations of btree-indexable operators. It will be able to prove the index is usable if the query includes restrictions like ts > '2009-01-02' ts >= '2009-01-02' ts = '2009-01-02' where the comparison is to a constant that is >= the one in the index predicate in the first case, or > the predicate in the others. Whether it will think that using the index is a win is a different question --- if the restriction is not reasonably selective it will likely not want to use an index anyway. > (A full index on source, ts is also built, but most queries are on values > within the last year.) If you have a full index on the same columns, I think that a partial index like that is likely to be a complete waste. It's just replicating a subtree of the full index, and saving you probably not more than one level of btree descent, at the cost of double the index update work and a lot more pressure on cache memory. regards, tom lane
Heyho! On Monday 01 March 2010 17.04:46 Tom Lane wrote: table > > values ( ts timestamp, source integer, value float ); > > [...] partial index on "(source, ts) where ts > '2009-01-01'"? > The planner is reasonably smart about deductions involving combinations > of btree-indexable operators. It will be able to prove the index is > usable if the query includes restrictions like > ts > '2009-01-02' > ts >= '2009-01-02' > ts = '2009-01-02' > where the comparison is to a constant that is >= the one in the index > predicate in the first case, or > the predicate in the others. Ok. > > Whether it will think that using the index is a win is a different > question --- if the restriction is not reasonably selective it will > likely not want to use an index anyway. > > > (A full index on source, ts is also built, but most queries are on > > values within the last year.) > > If you have a full index on the same columns, I think that a partial > index like that is likely to be a complete waste. It's just replicating > a subtree of the full index, and saving you probably not more than one > level of btree descent, at the cost of double the index update work and a > lot more pressure on cache memory. Ok, thanks. That was exactly the kind of answer/explanation I was looking for - I don't have enough experience to judge this. (And completely forgot the fact that an additional index means that it'd need to be loaded to memory at times, too.) Thanks a lot. Given the size of the table and given that query plans with small "test data sets" will likely be different from what happens in a live system, experimenting with various indices etc. is a bit time consuming, that's why I asked first. Currently, we're still in the green area, but as the table grows I'm quite sure I'll have to look into optimizing this area within the next one or two years... (Luckily the application is fully within our control as well so we can optimize on that side, too.) cheers -- vbi -- The worst cliques are those which consist of one man. -- G. B. Shaw
Attachment
I have the same table as yours with potential to grow over 50 billion of records once operational. But our hardware is currently very limited (8GB RAM). I concur with Tom Lane about the fact that partial indexes aren't really an option, but what about partitioning? I read from the Postgres docs that "The exact point at which a table will benefit from partitioning depends on the application, although a rule of thumb is that the size of the table should exceed the physical memory of the database server." http://www.postgresql.org/docs/current/static/ddl-partitioning.html Now, a table with 500M records would exceed our RAM, so I wonder what impact a table of 50G would have on simple lookup performance (i.e. source = fixed, timestamp = range), taking into account that a global index would exceed our RAM on some 1G records. Did anyone do some testing? Is partitioning a viable option in such scenario? "Adrian von Bidder" <avbidder@fortytwo.ch> wrote in message news:201003020849.19133@fortytwo.ch...