Thread: Partial index with regexp not working
I have a varchar ID field which captures a user account. If there is no user id, then we just store the IP address of the user. Realizing that we would like to index only those tuples where the user ID is not an IP, I am trying to create a partial index as follows: CREATE INDEX idx_trades_tid_partial ON trades (trader_id) WHERE trader_id ~ '[a-z]' ; Any "trader_id" that contains an alphabet letter will be registered user, as IPs only have numbers and dots. But when I do an EXPLAIN ANALYZE on a query which has a where condition as follows: WHERE trader_id = 'johndoe' It is not using this index at all! It is using no index in fact, it's trying to do a sequential scan. Any ideas why this partial index is not working?? TIA
Phoenix Kiula wrote: > CREATE INDEX idx_trades_tid_partial ON trades (trader_id) > WHERE trader_id ~ '[a-z]' ; > WHERE trader_id = 'johndoe' > > It is not using this index at all! It is using no index in fact, it's > trying to do a sequential scan. Any ideas why this partial index is > not working?? Why would it? There's no magic that lets PG know one arbitrary condition correlates with another. A partial index will only be considered if you test for its condition: SELECT ... WHERE trader_id = 'johndoe' AND trader_id ~ '[a-z]' For searching for specific IDs I wouldn't expect a partial index to be much better than a full index - unless you have a *lot* of IP addresses and hardly any names. -- Richard Huxton Archonet Ltd
--- Richard Huxton <dev@archonet.com> wrote: > Phoenix Kiula wrote: > > CREATE INDEX idx_trades_tid_partial ON trades (trader_id) > > WHERE trader_id ~ '[a-z]' ; > > > WHERE trader_id = 'johndoe' > > > > It is not using this index at all! It is using no index in fact, it's > > trying to do a sequential scan. Any ideas why this partial index is > > not working?? > A partial index will only be considered if you test for its condition: > > SELECT ... WHERE trader_id = 'johndoe' AND trader_id ~ '[a-z]' IIRC, for any index like this to work, doesn't the REGEXP need to be anchored to either the start or end of the string? Regards, Richard Broersma Jr.
Richard Broersma Jr wrote: > --- Richard Huxton <dev@archonet.com> wrote: >> Phoenix Kiula wrote: >>> CREATE INDEX idx_trades_tid_partial ON trades (trader_id) WHERE >>> trader_id ~ '[a-z]' ; WHERE trader_id = 'johndoe' >>> >>> It is not using this index at all! It is using no index in fact, >>> it's trying to do a sequential scan. Any ideas why this partial >>> index is not working?? >> A partial index will only be considered if you test for its >> condition: >> >> SELECT ... WHERE trader_id = 'johndoe' AND trader_id ~ '[a-z]' > > IIRC, for any index like this to work, doesn't the REGEXP need to be > anchored to either the start or end of the string? That's true if you are trying to match an expression to the index. For example: SELECT * FROM foo WHERE bar LIKE '%whatever%' A btree index can't help here for the same reason you can't find someone in a phone-book by their first name. SELECT * FROM foo WHERE bar LIKE 'whatever%' This *can* use an index, but only if you are in "C" locale or have set up text/varchar_pattern_ops appropriately. Then it gets converted into >= 'whatever' < 'whateves'. Now in Phoenix's example the regexp is just being used to specify what values the index covers. A more common example might be: CREATE INDEX inv_unpaid_idx ON invoices (client_id) WHERE NOT paid; This indexes client_id but only for those invoices that haven't been paid. Useful for a late-debtors report perhaps if you have lots of invoices but 99% have been paid already. The planner isn't smart enough to figure out which queries can use this index by examining them, it just looks for (NOT paid) in the WHERE clause and if it doesn't find it, ignores the index. -- Richard Huxton Archonet Ltd
Richard Huxton <dev@archonet.com> writes: > The planner isn't smart enough to figure out which queries can use this > index by examining them, it just looks for (NOT paid) in the WHERE > clause and if it doesn't find it, ignores the index. Well, it's a little bit brighter than that: it has some smarts about btree-indexable comparisons and about null-testing. For instance, it can figure out that "x > 3" implies "x > 0", and so a query WHERE x > 3 could use a partial index WHERE x > 0. Also, assuming that the > operator is strict, it would recognize that WHERE x IS NOT NULL is implied. But there's certainly not anything in there that can make inferences about regex matches. regards, tom lane
On 11/09/2007, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Richard Huxton <dev@archonet.com> writes: > > The planner isn't smart enough to figure out which queries can use this > > index by examining them, it just looks for (NOT paid) in the WHERE > > clause and if it doesn't find it, ignores the index. > > Well, it's a little bit brighter than that: it has some smarts about > btree-indexable comparisons and about null-testing. For instance, it > can figure out that "x > 3" implies "x > 0", and so a query WHERE x > 3 > could use a partial index WHERE x > 0. Also, assuming that the > > operator is strict, it would recognize that WHERE x IS NOT NULL is > implied. > > But there's certainly not anything in there that can make inferences > about regex matches. The suggestion in this thread that a regex index will come into play only when the WHERE condition specifically mentions it was indeed the key for me. In my case, the ratio of alphanumeric values for a column to values that are just plain IP addresses (numeric-plus-dots) is about 1:15, so an index on the alphanumeric is a HUGE benefit. This is a very smart difference from the world of the other major open source database, so I'd say the planner is bright enough even when it comes to regex. Many thanks
On 11/09/2007, Phoenix Kiula <phoenix.kiula@gmail.com> wrote: > > The suggestion in this thread that a regex index will come into play > only when the WHERE condition specifically mentions it was indeed the > key for me. Ok, I've hit a snag about this index. I think it's to do with how my regex is structured. Basically this column can have either IP addresses, or alphanumeric user IDs. If it is not an IP address, it is a registered user ID. What is the best way of ascertaining that a column value is *not* an IP address? I tried this: select * from trader where trader_id !~ '[0-9]+\.[0-9]+\.[0-9]+\.'; And this works, but I wonder if a partial index on a negative condition ("!~") will be slower than a positive condition?
Phoenix Kiula wrote: > > Ok, I've hit a snag about this index. I think it's to do with how my > regex is structured. Basically this column can have either IP > addresses, or alphanumeric user IDs. If it is not an IP address, it is > a registered user ID. What is the best way of ascertaining that a > column value is *not* an IP address? > > I tried this: > > select * from trader where trader_id !~ '[0-9]+\.[0-9]+\.[0-9]+\.'; > > And this works, but I wonder if a partial index on a negative > condition ("!~") will be slower than a positive condition? To be honest, I'd probably just have a separate column "uid_type", set it when creating the user and then just have a partial index WHERE uid_type='IP' -- Richard Huxton Archonet Ltd
On Wednesday 12 September 2007 09:34:55 Richard Huxton wrote: > To be honest, I'd probably just have a separate column "uid_type", set > it when creating the user and then just have a partial index WHERE > uid_type='IP' Or have a separate column with the user ID and have "(anonymous)" when you have the IP address on the table (maybe even NULL, it all depends on what you want). -- Jorge Godoy <jgodoy@gmail.com>