Thread: Add GiST support for mixed-width integer operators
Hi Hackers, I noticed that this query wasn't using my GiST index: postgres=# create extension btree_gist; CREATE EXTENSION postgres=# create table t (id bigint, valid_at daterange, exclude using gist (id with =, valid_at with &&)); CREATE TABLE postgres=# explain select * from t where id = 5; QUERY PLAN --------------------------------------------------- Seq Scan on t (cost=0.00..25.00 rows=6 width=40) Filter: (id = 5) (2 rows) But if I add a cast to bigint, it does: postgres=# explain select * from t where id = 5::bigint; QUERY PLAN --------------------------------------------------------------------------------- Bitmap Heap Scan on t (cost=4.19..13.66 rows=6 width=40) Recheck Cond: (id = '5'::bigint) -> Bitmap Index Scan on t_id_valid_at_excl (cost=0.00..4.19 rows=6 width=0) Index Cond: (id = '5'::bigint) (4 rows) There is a StackOverflow question about this with 5 upvotes, so it's not just me who was surprised by it.[1] The reason is that btree_gist only creates pg_amop entries for symmetrical operators, unlike btree which has =(int2,int8), etc. So this commit adds support for all combinations of int2/int4/int8 for all five btree operators (</<=/=/>=/>). After doing that, my query uses the index without a cast. One complication is that while btree has just one opfamily for everything (integer_ops), btree_gist splits things up into gist_int2_ops, gist_int4_ops, and gist_int8_ops. So where to put the operators? I thought it made the most sense for a larger width to support smaller ones, so I added =(int2,int8) and =(int4,int8) to gist_int8_ops, and I added =(int2,int4) to gist_int4_ops. [1] https://stackoverflow.com/questions/71788182/postgres-not-using-btree-gist-index Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
Attachment
> On 5 Jul 2024, at 23:46, Paul Jungwirth <pj@illuminatedcomputing.com> wrote: > > this commit adds support for all combinations of int2/int4/int8 for all five btree operators (</<=/=/>=/>). Looks like a nice feature to have. Would it make sense to do something similar to float8? Or, perhaps, some other types from btree_gist? Also, are we sure such tests will be stable? +SET enable_seqscan=on; +-- It should use the index with a different integer width: +EXPLAIN (COSTS OFF) +SELECT count(*) FROM int4tmp WHERE a = smallint '42'; + QUERY PLAN +------------------------------------------------ + Aggregate + -> Bitmap Heap Scan on int4tmp + Recheck Cond: (a = '42'::smallint) + -> Bitmap Index Scan on int4idx + Index Cond: (a = '42'::smallint) Best regards, Andrey Borodin.
On 7/6/24 05:04, Andrey M. Borodin wrote:>> On 5 Jul 2024, at 23:46, Paul Jungwirth <pj@illuminatedcomputing.com> wrote: >> >> this commit adds support for all combinations of int2/int4/int8 for all five btree operators (</<=/=/>=/>). > > Looks like a nice feature to have. > Would it make sense to do something similar to float8? Or, perhaps, some other types from btree_gist? Here is another patch adding float4/8 and also date/timestamp/timestamptz, in the same combinations as btree. No other types seem like they deserve this treatment. For example btree doesn't mix oids with ints. > Also, are we sure such tests will be stable? You're right, it was questionable. We hadn't analyzed the table, and after doing that the plan changes from a bitmap scan to an index-only scan. That makes more sense, and I doubt it will change now that it's based on statistics. Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
Attachment
Paul Jungwirth <pj@illuminatedcomputing.com> writes: > On 7/6/24 05:04, Andrey M. Borodin wrote:>> On 5 Jul 2024, at 23:46, Paul Jungwirth > <pj@illuminatedcomputing.com> wrote: >>> this commit adds support for all combinations of int2/int4/int8 for all five btree operators (</<=/=/>=/>). Perhaps I'm missing something, but how can this possibly work without any changes to the C code? For example, gbt_int4_consistent assumes that the comparison value is always an int4. Due to the way we represent Datums in-memory, this will kind of work if it's really an int2 or int8 --- unless the comparison value doesn't fit in int4, and then you will get a completely wrong answer based on a value truncated to int4. (But I would argue that even the cases where it seems to work are a type violation, and getting the right answer is accidental if you have not applied the correct PG_GETARG conversion macro.) Plus, int4-vs-int8 comparisons will fail in very obvious ways, up to and including core dumps, on 32-bit machines where int8 is pass-by-reference. > Here is another patch adding float4/8 and also date/timestamp/timestamptz, in the same combinations > as btree. Similar problems, plus comparing timestamp to timestamptz requires a timezone conversion that this code isn't doing. I think to make this work, you'd need to define a batch of new opclass-specific strategy numbers that convey both the kind of comparison to perform and the type of the RHS value. And then there would need to be a nontrivial amount of new code in the consistent functions to deal with cross-type cases. regards, tom lane