GIN indexscans versus equality selectivity estimation - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | GIN indexscans versus equality selectivity estimation |
Date | |
Msg-id | 631.1294616287@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: GIN indexscans versus equality selectivity estimation
Re: GIN indexscans versus equality selectivity estimation |
List | pgsql-hackers |
Whilst fooling around with GIN over the past few days, I noticed the following rather surprising behavior: regression=# create table t1 (f1 int[]); CREATE TABLE regression=# insert into t1 values (array[42]); INSERT 0 1 regression=# create index ti1 on t1 using gin (f1); CREATE INDEX regression=# set enable_seqscan to 0; SET regression=# explain select * from t1 where f1 = array[42]; QUERY PLAN -----------------------------------------------------------------------Seq Scan on t1 (cost=10000000000.00..10000000001.01rows=1 width=32) Filter: (f1 = '{42}'::integer[]) (2 rows) The system absolutely will not use the index in this state, even though the GIN array opclass does support equality. I dug around and eventually figured out why: 1. We don't have any pg_stats statistics in this state; but we do know reltuples = 1, since the CREATE INDEX helpfully updated the pg_class row with that information. Without stats, eqsel() falls back to a selectivity estimate of 1/numdistinct; and without stats, get_variable_numdistinct estimates the number of distinct values as equal to reltuples, if that's a small number. The upshot is that we get a selectivity estimate of exactly 1.0 for the equality condition. 2. In indxpath.c, we have the following comment and code about selecting which paths to generate bitmap scan paths for: * ... Also, pick out the ones that might be useful * as bitmap scans. For that, we must discard indexes that don'tsupport * bitmap scans, and we also are only interested in paths that have some * selectivity; we should discardanything that was generated solely for * ordering purposes. if (ipath->indexinfo->amhasgetbitmap && ipath->indexselectivity < 1.0 && !ScanDirectionIsBackward(ipath->indexscandir)) bitindexpaths = lappend(bitindexpaths, ipath); Since GIN doesn't support plain indexscans, only bitmap scans, this means that the planner simply won't use a GIN index unless the estimated selectivity is less than 1.0. There are several things we might choose to do about this: 1. Do nothing. The issue seems quite unlikely to affect anyone in the field, since in fact "use a seqscan" is probably the right answer anytime reltuples = 1; and anyway using a GIN index for plain equality is a corner case to begin with. However, it could confuse people who were doing testing (it confused me!). 2. Hack the selectivity estimators so that we don't get exactly 1.0 for this sort of situation. It might be plausible for get_variable_numdistinct to set a floor of 2 on its result, for example; or we could hack eqsel() to bound the no-stats estimate to a bit less than 1. 3. Change the indxpath.c code to have some different way of determining whether a path is sensible to use as a bitmap scan. We could for instance make the test be "path has some indexquals and/or index is partial". Or maybe there should be an exception in the case where the index type doesn't support plain indexscan. I'm not really sold on any of these alternatives. Thoughts? regards, tom lane
pgsql-hackers by date: