Re: surprising query optimisation - Mailing list pgsql-general
From | Gavin Flower |
---|---|
Subject | Re: surprising query optimisation |
Date | |
Msg-id | 9a952022-82b1-a88d-07fb-72c47bc0f875@archidevsys.co.nz Whole thread Raw |
In response to | Re: surprising query optimisation (Stephen Frost <sfrost@snowman.net>) |
Responses |
Re: surprising query optimisation
|
List | pgsql-general |
On 01/12/2018 04:33, Stephen Frost wrote: > Greetings, > > * Chris Withers (chris@withers.org) wrote: >> On 28/11/2018 22:49, Stephen Frost wrote: >>> * Chris Withers (chris@withers.org) wrote: >>>> We have an app that deals with a lot of queries, and we've been slowly >>>> seeing performance issues emerge. We take a lot of free form queries from >>>> users and stumbled upon a very surprising optimisation. >>>> >>>> So, we have a 'state' column which is a 3 character string column with an >>>> index on it. Despite being a string, this column is only used to store one >>>> of three values: 'NEW', 'ACK', or 'RSV'. >>> Sounds like a horrible field to have an index on. >> That's counter-intuitive for me. What leads you to say this and what would >> you do/recommend instead? > For this, specifically, it's because you end up with exactly what you > have: a large index with tons of duplicate values. Indexes are > particularly good when you have high-cardinality fields. Now, if you > have a skewed index, where there's one popular value and a few much less > popular ones, then that's where you really want a partial index (as I > suggest earlier) so that queries against the non-popular value(s) is > able to use the index and the index is much smaller. > > Of course, for this to work you need to set up the partial index > correctly and make sure that your queries end up using it. > > Thanks! > > Stephen An index simply tells pg which block to look at (assuming that the index itself is not sufficient to satisfy the query), so if using an index would still require that pg look at most blocks, then it cheaper to just scan the whole table - rather than load the index and still look at all blocks that contain the table data. I've oversimplified slightly. An index is best used when using it results in fewer blocks being read from disk. Also the use of RAM is better when the size of the index is kept small. For example having an index on sex for nurses is a waste of time because most nurses are female. However, a partial index (as suggested by Stephen, for your query) that includes only males could be useful if you have queries looking for male nurses (assumes male nurses are a very small fraction of nurses such that most data blocks don't have rows for males nurses, and the planner knows this). I once optimised a very complex set queries that made extensive use of indexes. However, with the knowledge I have today, I would have most likely had fewer and smaller indexes. As I now realize, that some of my indexes were probably counter productive, especially as I'd given no thought to how much RAM would be required to host the data plus indexes! Fortunately, while the objective was to run all those queries within 24 hours, they actually only took a couple of hours. Chris, I would strongly suggest, you read up on the excellent documentation pg has about indexes, but don't expect to understand it all at one sitting... Cheers, Gavin
pgsql-general by date: