Re: [GENERAL] indexed regex select optimisation missing? - Mailing list pgsql-general
From | Stuart Woolford |
---|---|
Subject | Re: [GENERAL] indexed regex select optimisation missing? |
Date | |
Msg-id | 99110512143000.01188@test.macmillan.co.nz Whole thread Raw |
In response to | Re: [GENERAL] indexed regex select optimisation missing? ("Ross J. Reedstrom" <reedstrm@wallace.ece.rice.edu>) |
Responses |
Re: [GENERAL] indexed regex select optimisation missing?
|
List | pgsql-general |
On Fri, 05 Nov 1999, you wrote: > Ah, your description just tripped a memory for me from the hackers list: > > The behavior you describe has to do with the implementation of using an > index for regex matching, in the presence of the USE_LOCALE configuration > option. > > Internally, the condition: WHERE word~'^alongword' is converted in the > parser(!) to: > > WHERE word >= 'alongword' AND word < 'alongword\377' > > since the index needs inequalities to be used, not matches. Now, the > problem is the hack of tacking an octal \377 on the string to create > the lexagraphically 'just bigger' value assumes ASCI sort order. If > USE_LOCALE is defined, this is dropped, since we don't have a good fix > yet, and slow correct behavior is better than fast, incorrect behavior. ah, now this makes sense, I'm using the RPMs, and I bet they have lexical enabled by default (damb! perhaps another set should be produced without this option? it makes a BIG difference) > > So, you have two options: if you don't need locale support, recompile > without it. Otherwise, hand code your anchored matches as the pair of > conditionals above Hmm, is there syntax for adding an arbitrary value to > a string constant in the SQL? I suppose you could use: word < 'alongwore', > i.e. hand increment the last character, so it's larger than any match. I've tried a test using ">='window' and <'windox'", and it works perfectly, and very very fast, so I think we have found your culprit. > > Your point is correct, the developers are aware of it as a theoretical > problem, at least. Always helps to hear a real world case, though. I > believe it's on the TODO list as is, otherwise, pester Bruce. ;-) > > Reviewing my email logs from June, most of the work on this has to do with > people who needs locales, and potentially multibyte character sets. Tom > Lane is of the opinion that this particular optimization needs to be moved > out of the parser, and deeper into the planner or optimizer/rewriter, > so a good fix may be some ways out. Hmm, perhaps a 'good' initial fix would be to produce another set of RPMs, and/or add it to the FAQ in the 4.x section about the slow queries that say indexes are used for this type of search. using the >= AND < trick does seem to work, but is a little non-obvious (and hard to code in some situations, it will make quite a difference to how I need to implement my searching system) > > Ross thank you very very much for your assistance on this, it is greatly appreciated! -- ------------------------------------------------------------ Stuart Woolford, stuartw@newmail.net Unix Consultant. Software Developer. Supra Club of New Zealand. ------------------------------------------------------------
pgsql-general by date: