Re: Fixing GIN for empty/null/full-scan cases - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Fixing GIN for empty/null/full-scan cases |
Date | |
Msg-id | 5057.1294360280@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Fixing GIN for empty/null/full-scan cases (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Fixing GIN for empty/null/full-scan cases
Re: Fixing GIN for empty/null/full-scan cases |
List | pgsql-hackers |
Attached is a WIP patch along the lines I suggested earlier: this covers the core code and docs, but doesn't yet do anything to the various opclass-specific functions. Testing it, I find that there was a gap in my previous thinking. The patch works for cases like select ... where arraycol = '{}' which previously failed for lack of ability to do a full-index scan. However, it doesn't work for cases like select ... where arraycol <@ '{1,2}' This correctly returns {1}, {2}, and {1,2} --- but it doesn't find {}. There are empty-item placeholder entries in the index for the latter, but since the query does contain some real keys, it's not a full-index scan and thus it doesn't look for the empty-item placeholders. There are basically two ways we could fix this: with or without the extractQuery function's cooperation. Without any help from extractQuery, it would be necessary for *every* GIN query to include empty-item placeholders in the search and then see if the consistentFn would succeed on those rows. That seems pretty unattractive from an efficiency standpoint, not to mention that it opens the possibility I mentioned before of old consistentFns assuming without checking that there's at least one true check[] entry. What seems like a better idea is for extractQuery to explicitly tell us whether or not an empty indexed item can match the query. In the terms of this patch, that amounts to including a GIN_CAT_EMPTY_ITEM placeholder as one of the search targets. Now, there are two ways we could do that, too: 1. Explicitly include the EMPTY_ITEM target as part of the Datum[] array returned by extractQuery. That seems a bit grotty on a couple of grounds, one being that we couldn't use just a bool isNull[] auxiliary array but would have to expose GinNullCategory or something similar to the opclasses. Another problem with it is that then the "empty" target would presumably be included in the check[] array, with the result that an "empty" heap item would have one check[] value set, which would mess up simple AND/OR combination logic such as you see in ginarrayconsistent(). 2. Add another output bool parameter to extractQuery that it must set true (from a default false state) if the query could match with no check values set. This would prompt the GIN code to search for EMPTY_ITEM placeholders, but they'd not be part of the check[] array. With either one of these approaches, we can avoid the backwards-compatibility problem of a consistentFn being passed cases that it's not expecting, because it's not going to see any such cases without an update to its cohort extractQueryFn. (In particular, if the extractQueryFn returns zero keys and doesn't take an additional action to say that an empty item can match, then we'll treat that as an unsatisfiable query instead of calling a consistentFn that might misbehave on such a case.) I think I like option #2 better. Comments? regards, tom lane
Attachment
pgsql-hackers by date: