Thread: Add GiST support for mixed-width integer operators

Add GiST support for mixed-width integer operators

From
Paul Jungwirth
Date:
Hi Hackers,

I noticed that this query wasn't using my GiST index:

postgres=# create extension btree_gist;
CREATE EXTENSION
postgres=# create table t (id bigint, valid_at daterange, exclude using gist (id with =, valid_at 
with &&));
CREATE TABLE
postgres=# explain select * from t where id = 5;
                     QUERY PLAN
---------------------------------------------------
  Seq Scan on t  (cost=0.00..25.00 rows=6 width=40)
    Filter: (id = 5)
(2 rows)

But if I add a cast to bigint, it does:

postgres=# explain select * from t where id = 5::bigint;
                                    QUERY PLAN
---------------------------------------------------------------------------------
  Bitmap Heap Scan on t  (cost=4.19..13.66 rows=6 width=40)
    Recheck Cond: (id = '5'::bigint)
    ->  Bitmap Index Scan on t_id_valid_at_excl  (cost=0.00..4.19 rows=6 width=0)
          Index Cond: (id = '5'::bigint)
(4 rows)

There is a StackOverflow question about this with 5 upvotes, so it's not just me who was surprised 
by it.[1]

The reason is that btree_gist only creates pg_amop entries for symmetrical operators, unlike btree 
which has =(int2,int8), etc. So this commit adds support for all combinations of int2/int4/int8 for 
all five btree operators (</<=/=/>=/>). After doing that, my query uses the index without a cast.

One complication is that while btree has just one opfamily for everything (integer_ops), btree_gist 
splits things up into gist_int2_ops, gist_int4_ops, and gist_int8_ops. So where to put the 
operators? I thought it made the most sense for a larger width to support smaller ones, so I added 
=(int2,int8) and =(int4,int8) to gist_int8_ops, and I added =(int2,int4) to gist_int4_ops.

[1] https://stackoverflow.com/questions/71788182/postgres-not-using-btree-gist-index

Yours,

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com
Attachment

Re: Add GiST support for mixed-width integer operators

From
"Andrey M. Borodin"
Date:

> On 5 Jul 2024, at 23:46, Paul Jungwirth <pj@illuminatedcomputing.com> wrote:
>
> this commit adds support for all combinations of int2/int4/int8 for all five btree operators (</<=/=/>=/>).

Looks like a nice feature to have.
Would it make sense to do something similar to float8? Or, perhaps, some other types from btree_gist?

Also, are we sure such tests will be stable?
+SET enable_seqscan=on;
+-- It should use the index with a different integer width:
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM int4tmp WHERE a = smallint '42';
+                   QUERY PLAN
+------------------------------------------------
+ Aggregate
+   ->  Bitmap Heap Scan on int4tmp
+         Recheck Cond: (a = '42'::smallint)
+         ->  Bitmap Index Scan on int4idx
+               Index Cond: (a = '42'::smallint)


Best regards, Andrey Borodin.


Re: Add GiST support for mixed-width integer operators

From
Paul Jungwirth
Date:
On 7/6/24 05:04, Andrey M. Borodin wrote:>> On 5 Jul 2024, at 23:46, Paul Jungwirth 
<pj@illuminatedcomputing.com> wrote:
>>
>> this commit adds support for all combinations of int2/int4/int8 for all five btree operators (</<=/=/>=/>).
> 
> Looks like a nice feature to have.
> Would it make sense to do something similar to float8? Or, perhaps, some other types from btree_gist?

Here is another patch adding float4/8 and also date/timestamp/timestamptz, in the same combinations 
as btree.

No other types seem like they deserve this treatment. For example btree doesn't mix oids with ints.

> Also, are we sure such tests will be stable?

You're right, it was questionable. We hadn't analyzed the table, and after doing that the plan 
changes from a bitmap scan to an index-only scan. That makes more sense, and I doubt it will change 
now that it's based on statistics.

Yours,

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com
Attachment

Re: Add GiST support for mixed-width integer operators

From
Tom Lane
Date:
Paul Jungwirth <pj@illuminatedcomputing.com> writes:
> On 7/6/24 05:04, Andrey M. Borodin wrote:>> On 5 Jul 2024, at 23:46, Paul Jungwirth
> <pj@illuminatedcomputing.com> wrote:
>>> this commit adds support for all combinations of int2/int4/int8 for all five btree operators (</<=/=/>=/>).

Perhaps I'm missing something, but how can this possibly work without
any changes to the C code?

For example, gbt_int4_consistent assumes that the comparison value
is always an int4.  Due to the way we represent Datums in-memory,
this will kind of work if it's really an int2 or int8 --- unless the
comparison value doesn't fit in int4, and then you will get a
completely wrong answer based on a value truncated to int4.  (But I
would argue that even the cases where it seems to work are a type
violation, and getting the right answer is accidental if you have not
applied the correct PG_GETARG conversion macro.)  Plus, int4-vs-int8
comparisons will fail in very obvious ways, up to and including core
dumps, on 32-bit machines where int8 is pass-by-reference.

> Here is another patch adding float4/8 and also date/timestamp/timestamptz, in the same combinations
> as btree.

Similar problems, plus comparing timestamp to timestamptz requires a
timezone conversion that this code isn't doing.

I think to make this work, you'd need to define a batch of new
opclass-specific strategy numbers that convey both the kind of
comparison to perform and the type of the RHS value.  And then
there would need to be a nontrivial amount of new code in the
consistent functions to deal with cross-type cases.

            regards, tom lane