Re: Avoid full GIN index scan when possible - Mailing list pgsql-hackers
From | Marc Cousin |
---|---|
Subject | Re: Avoid full GIN index scan when possible |
Date | |
Msg-id | eca8612b-e289-e2fc-c5df-84be46f75ec1@gmail.com Whole thread Raw |
In response to | Re: Avoid full GIN index scan when possible (Julien Rouhaud <rjuju123@gmail.com>) |
Responses |
Re: Avoid full GIN index scan when possible
|
List | pgsql-hackers |
On 29/06/2019 00:23, Julien Rouhaud wrote: > On Fri, Jun 28, 2019 at 10:16 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> >> Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >>> On Fri, Jun 28, 2019 at 03:03:19PM -0400, Tom Lane wrote: >>>> I not only don't want that function in indxpath.c, I don't even want >>>> it to be known/called from there. If we need the ability for the index >>>> AM to editorialize on the list of indexable quals (which I'm not very >>>> convinced of yet), let's make an AM interface function to do it. >> >>> Wouldn't it be better to have a function that inspects a single qual and >>> says whether it's "optimizable" or not? That could be part of the AM >>> implementation, and we'd call it and it'd be us messing with the list. >> >> Uh ... we already determined that the qual is indexable (ie is a member >> of the index's opclass), or allowed the index AM to derive an indexable >> clause from it, so I'm not sure what you envision would happen >> additionally there. If I understand what Julien is concerned about >> --- and I may not --- it's that the set of indexable clauses *as a whole* >> may have or lack properties of interest. So I'm thinking the answer >> involves some callback that can do something to the whole list, not >> qual-at-a-time. We've already got facilities for the latter case. > > Yes, the root issue here is that with gin it's entirely possible that > "WHERE sometable.col op value1" is way more efficient than "WHERE > sometable.col op value AND sometable.col op value2", where both qual > are determined indexable by the opclass. The only way to avoid that > is indeed to inspect the whole list, as done in this poor POC. > > This is a problem actually hit in production, and as far as I know > there's no easy way from the application POV to prevent unexpected > slowdown. Maybe Marc will have more details about the actual problem > and how expensive such a case was compared to the normal ones. Sorry for the delay... Yes, quite easily, here is what we had (it's just a bit simplified, we have other criterions but I think it shows the problem): rh2=> explain analyze select * from account_employee where typeahead like '%albert%'; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on account_employee (cost=53.69..136.27 rows=734 width=666) (actual time=15.562..35.044 rows=8957 loops=1) Recheck Cond: (typeahead ~~ '%albert%'::text) Rows Removed by Index Recheck: 46 Heap Blocks: exact=8919 -> Bitmap Index Scan on account_employee_site_typeahead_gin_idx (cost=0.00..53.51 rows=734 width=0) (actual time=14.135..14.135rows=9011 loops=1) Index Cond: (typeahead ~~ '%albert%'::text) Planning time: 0.224 ms Execution time: 35.389 ms (8 rows) rh2=> explain analyze select * from account_employee where typeahead like '%albert%' and typeahead like '%lo%'; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on account_employee (cost=28358.38..28366.09 rows=67 width=666) (actual time=18210.109..18227.134 rows=1172loops=1) Recheck Cond: ((typeahead ~~ '%albert%'::text) AND (typeahead ~~ '%lo%'::text)) Rows Removed by Index Recheck: 7831 Heap Blocks: exact=8919 -> Bitmap Index Scan on account_employee_site_typeahead_gin_idx (cost=0.00..28358.37 rows=67 width=0) (actual time=18204.756..18204.756rows=9011 loops=1) Index Cond: ((typeahead ~~ '%albert%'::text) AND (typeahead ~~ '%lo%'::text)) Planning time: 0.288 ms Execution time: 18230.182 ms (8 rows) We noticed this because the application timed out for users searching someone whose name was 2 characters ( it happens :)). We reject such filters when it's the only criterion, as we know it's going to be slow, but ignoring it as a supplementaryfilter would be a bit weird. Of course there is the possibility of filtering with two stages with a CTE, but that's not as great as having PostgreSQLdoing it itself. By the way, while preparing this, I noticed that it seems that during this kind of index scan, the interrupt signal is masked for a very long time. Control-C takes a very long while to cancel the query. But it's an entirely different problem :) Regards
Attachment
pgsql-hackers by date: