Home > mailing lists

Re: surprising query optimisation - Mailing list pgsql-general

From	Gavin Flower
Subject	Re: surprising query optimisation
Date	November 30, 2018 22:10:46
Msg-id	9a952022-82b1-a88d-07fb-72c47bc0f875@archidevsys.co.nz Whole thread
In response to	Re: surprising query optimisation (Stephen Frost <sfrost@snowman.net>)
Responses	Re: surprising query optimisation
List	pgsql-general

Tree view

On 01/12/2018 04:33, Stephen Frost wrote:
> Greetings,
>
> * Chris Withers (chris@withers.org) wrote:
>> On 28/11/2018 22:49, Stephen Frost wrote:
>>> * Chris Withers (chris@withers.org) wrote:
>>>> We have an app that deals with a lot of queries, and we've been slowly
>>>> seeing performance issues emerge. We take a lot of free form queries from
>>>> users and stumbled upon a very surprising optimisation.
>>>>
>>>> So, we have a 'state' column which is a 3 character string column with an
>>>> index on it. Despite being a string, this column is only used to store one
>>>> of three values: 'NEW', 'ACK', or 'RSV'.
>>> Sounds like a horrible field to have an index on.
>> That's counter-intuitive for me. What leads you to say this and what would
>> you do/recommend instead?
> For this, specifically, it's because you end up with exactly what you
> have: a large index with tons of duplicate values.  Indexes are
> particularly good when you have high-cardinality fields.  Now, if you
> have a skewed index, where there's one popular value and a few much less
> popular ones, then that's where you really want a partial index (as I
> suggest earlier) so that queries against the non-popular value(s) is
> able to use the index and the index is much smaller.
>
> Of course, for this to work you need to set up the partial index
> correctly and make sure that your queries end up using it.
>
> Thanks!
>
> Stephen

An index simply tells pg which block to look at (assuming that the index 
itself is not sufficient to satisfy the query), so if using an index 
would still require that pg look at most blocks, then it cheaper to just 
scan the whole table - rather than load the index and still look at all 
blocks that contain the table data.  I've oversimplified slightly.

An index is best used when using it results in fewer blocks being read 
from disk.

Also the use of RAM is better when the size of the index is kept small.  
For example having an index on sex for nurses is a waste of time because 
most nurses are female.  However, a partial index (as suggested by 
Stephen, for your query) that includes only males could be useful if you 
have queries looking for male nurses (assumes male nurses are a very 
small fraction of nurses such that most data blocks don't have rows for 
males nurses, and the planner knows this).

I once optimised a very complex set queries that made extensive use of 
indexes.  However, with the knowledge I have today, I would have most 
likely had fewer and smaller indexes.  As I now realize, that some of my 
indexes were probably counter productive, especially as I'd given no 
thought to how much RAM would be required to host the data plus 
indexes!  Fortunately, while the objective was to run all those queries 
within 24 hours, they actually only took a couple of hours.

Chris, I would strongly suggest, you read up on the excellent 
documentation pg has about indexes, but don't expect to understand it 
all at one sitting...

Cheers,
Gavin

pgsql-general by date:

From: Adrian Klaver
Date: 30 November 2018, 21:55:42
Subject: Re: Dump table using pg_dump vs pg_restore -f

From: Ron
Date: 01 December 2018, 11:24:45
Subject: Unused indexes

Re: surprising query optimisation - Mailing list pgsql-general

Previous

Next