Thread: Re: Index AM API cleanup

Re: Index AM API cleanup

From

Peter Eisentraut

Date:

26 August 2024, 15:21:26

On 21.08.24 21:25, Mark Dilger wrote:
> The next twenty patches are a mix of fixes of various layering
> violations, such as not allowing non-core index AMs from use in replica
> identity full, or for speculative insertion, or for foreign key
> constraints, or as part of merge join; with updates to the "treeb" code
> as needed.  The changes to "treeb" are broken out so that they can also
> easily be excluded from whatever gets committed.

I made a first pass through this patch set.  I think the issues it aims 
to address are mostly legitimate.  In a few cases, we might need some 
more discussion and perhaps will end up slicing the APIs a bit 
differently.  The various patches that generalize the strategy numbers 
appear to overlap with things being discussed at [0], so we should see 
that the solution covers all the use cases.

[0]: 
https://www.postgresql.org/message-id/flat/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com

To make a dent, I picked out something that should be mostly harmless: 
Stop calling directly into _bt_getrootheight() (patch 0004).  I think 
this patch is ok, but I might call the API function amgettreeheight 
instead of amgetrootheight.  The former seems more general.

Also, a note for us all in this thread, changes to the index AM API need 
updates to the corresponding documentation in doc/src/sgml/indexam.sgml.

I notice that _bt_getrootheight() is called only to fill in the 
IndexOptInfo tree_height field, which is only used by btcostestimate(), 
so in some sense this is btree-internal data.  But making it so that 
btcostestimate() calls _bt_getrootheight() directly to avoid all that 
intermediate business seems too complicated, and there was probably a 
reason that the cost estimation functions don't open the index.

Interestingly, the cost estimation functions for gist and spgist also 
look at the tree_height field but nothing ever fills it on.  So with 
your API restructuring, someone could provide the missing API functions 
for those index types.  Might be interesting.

That said, there might be value in generalizing this a bit.  If you look 
at the cost estimation functions in pgvector (hnswcostestimate() and 
ivfflatcostestimate()), they both have this pattern that 
btcostestimate() tries to avoid: They open the index, look up some 
number, close the index, then make a cost estimate computation with the 
number looked up.  So another idea would be to generalize the 
tree_height field to some "index size data" or even "internal data for 
cost estimation".  This wouldn't need to change the API much, since 
these are all just integer values, but we'd label the functions and 
fields a bit differently.

Re: Index AM API cleanup

From

Mark Dilger

Date:

26 August 2024, 18:10:34


> On Aug 26, 2024, at 5:21 AM, Peter Eisentraut <peter@eisentraut.org> wrote:
>
> On 21.08.24 21:25, Mark Dilger wrote:
>> The next twenty patches are a mix of fixes of various layering
>> violations, such as not allowing non-core index AMs from use in replica
>> identity full, or for speculative insertion, or for foreign key
>> constraints, or as part of merge join; with updates to the "treeb" code
>> as needed.  The changes to "treeb" are broken out so that they can also
>> easily be excluded from whatever gets committed.
>
> I made a first pass through this patch set.

Peter, thanks for the review!

> I think the issues it aims to address are mostly legitimate.  In a few cases, we might need some more discussion and
perhapswill end up slicing the APIs a bit differently.  The various patches that generalize the strategy numbers appear
tooverlap with things being discussed at [0], so we should see that the solution covers all the use cases. 
>
> [0]: https://www.postgresql.org/message-id/flat/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com

Paul, it seems what you are doing in v39-0001-Add-stratnum-GiST-support-function.patch is similar to what I am doing in
v17-0012-Convert-strategies-to-and-from-row-compare-types.patch. In particular, your function 

+
+/*
+ * Returns the btree number for supported operators, otherwise invalid.
+ */
+Datum
+gist_stratnum_btree(PG_FUNCTION_ARGS)
+{
+ StrategyNumber strat = PG_GETARG_UINT16(0);
+
+ switch (strat)
+ {
+ case RTEqualStrategyNumber:
+ PG_RETURN_UINT16(BTEqualStrategyNumber);
+ case RTLessStrategyNumber:
+ PG_RETURN_UINT16(BTLessStrategyNumber);
+ case RTLessEqualStrategyNumber:
+ PG_RETURN_UINT16(BTLessEqualStrategyNumber);
+ case RTGreaterStrategyNumber:
+ PG_RETURN_UINT16(BTGreaterStrategyNumber);
+ case RTGreaterEqualStrategyNumber:
+ PG_RETURN_UINT16(BTGreaterEqualStrategyNumber);
+ default:
+ PG_RETURN_UINT16(InvalidStrategy);
+ }
+}

looks similar to the implementation of an amtranslate_rctype_function.  Do you have any interest in taking a look?



> To make a dent, I picked out something that should be mostly harmless: Stop calling directly into _bt_getrootheight()
(patch0004).  I think this patch is ok, but I might call the API function amgettreeheight instead of amgetrootheight.
Theformer seems more general. 

Peter, your proposed rename seems fine for the current implementation, but your suggestion below might indicate a
differentnaming. 

> I notice that _bt_getrootheight() is called only to fill in the IndexOptInfo tree_height field, which is only used by
btcostestimate(),so in some sense this is btree-internal data.  But making it so that btcostestimate() calls
_bt_getrootheight()directly to avoid all that intermediate business seems too complicated, and there was probably a
reasonthat the cost estimation functions don't open the index. 
>
> Interestingly, the cost estimation functions for gist and spgist also look at the tree_height field but nothing ever
fillsit on.  So with your API restructuring, someone could provide the missing API functions for those index types.
Mightbe interesting. 
>
> That said, there might be value in generalizing this a bit.  If you look at the cost estimation functions in pgvector
(hnswcostestimate()and ivfflatcostestimate()), they both have this pattern that btcostestimate() tries to avoid: They
openthe index, look up some number, close the index, then make a cost estimate computation with the number looked up.
Soanother idea would be to generalize the tree_height field to some "index size data" or even "internal data for cost
estimation". This wouldn't need to change the API much, since these are all just integer values, but we'd label the
functionsand fields a bit differently. 

Would they be just integers?  They could also be void*, with amgetrootheight_function returning data allocated in the
currentmemory context.  For btree, that would just be a four byte integer, but other indexes could return whatever they
like. If you like that idea, I can code that up for v18, naming the field something like
amgetcostestimateinfo_function.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Index AM API cleanup

From

Mark Dilger

Date:

03 September 2024, 20:26:11


> On Sep 3, 2024, at 9:52 AM, Peter Eisentraut <peter@eisentraut.org> wrote:
>
> Here is a cleaned-up version of the v17-0004 patch.  I have applied the renaming discussed above.  I have also made a
wrapperfunction btgettreeheight() that calls _bt_getrootheight().  That way, if we ever want to change the API, we
don'thave to change _bt_getrootheight(), or disentangle it then.  I've also added documentation and put in a source
codecomment that the API could be expanded for additional uses. 

Ok.

>  Also, I have removed the addition to the IndexOptInfo struct; that didn't seem necessary.

Good catch.  I agree with the change.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Index AM API cleanup

From

Peter Eisentraut

Date:

10 September 2024, 11:10:31

On 03.09.24 19:26, Mark Dilger wrote:
>> On Sep 3, 2024, at 9:52 AM, Peter Eisentraut <peter@eisentraut.org> wrote:
>>
>> Here is a cleaned-up version of the v17-0004 patch.  I have applied the renaming discussed above.  I have also made
awrapper function btgettreeheight() that calls _bt_getrootheight().  That way, if we ever want to change the API, we
don'thave to change _bt_getrootheight(), or disentangle it then.  I've also added documentation and put in a source
codecomment that the API could be expanded for additional uses.
 
> 
> Ok.
> 
>>   Also, I have removed the addition to the IndexOptInfo struct; that didn't seem necessary.
> 
> Good catch.  I agree with the change.

I have committed this patch.  (It needed another small change to silence 
a compiler warning about an uninitialized variable.)

Re: Index AM API cleanup

From

Peter Eisentraut

Date:

24 September 2024, 11:50:32

Next, I have reviewed patches

v17-0010-Track-sort-direction-in-SortGroupClause.patch
v17-0011-Track-scan-reversals-in-MergeJoin.patch

Both of these seem ok and sensible to me.

They take the concept of the "reverse" flag that already exists in the 
affected code and just apply it more consistently throughout the various 
code layers, instead of relying on strategy numbers as intermediate 
storage.  This is both helpful for your ultimate goal in this patch 
series, and it also makes the affected code areas simpler and more 
consistent and robust.

Re: Index AM API cleanup

From

Mark Dilger

Date:

24 September 2024, 12:09:41


> On Sep 24, 2024, at 10:50 AM, Peter Eisentraut <peter@eisentraut.org> wrote:
>
> Next, I have reviewed patches
>
> v17-0010-Track-sort-direction-in-SortGroupClause.patch
> v17-0011-Track-scan-reversals-in-MergeJoin.patch
>
> Both of these seem ok and sensible to me.
>
> They take the concept of the "reverse" flag that already exists in the affected code and just apply it more
consistentlythroughout the various code layers, instead of relying on strategy numbers as intermediate storage.  This
isboth helpful for your ultimate goal in this patch series, and it also makes the affected code areas simpler and more
consistentand robust. 
>

Thanks for the review!

Yes, I found the existing use of a btree strategy number rather than a boolean "reverse" flag made using the code from
otherindex AMs needlessly harder.  I am glad you see it the same way. 

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Index AM API cleanup

From

Peter Eisentraut

Date:

14 October 2024, 17:31:00

On 24.09.24 11:09, Mark Dilger wrote:
> 
> 
>> On Sep 24, 2024, at 10:50 AM, Peter Eisentraut <peter@eisentraut.org> wrote:
>>
>> Next, I have reviewed patches
>>
>> v17-0010-Track-sort-direction-in-SortGroupClause.patch
>> v17-0011-Track-scan-reversals-in-MergeJoin.patch
>>
>> Both of these seem ok and sensible to me.
>>
>> They take the concept of the "reverse" flag that already exists in the affected code and just apply it more
consistentlythroughout the various code layers, instead of relying on strategy numbers as intermediate storage.  This
isboth helpful for your ultimate goal in this patch series, and it also makes the affected code areas simpler and more
consistentand robust.
 
>>
> 
> Thanks for the review!
> 
> Yes, I found the existing use of a btree strategy number rather than a boolean "reverse" flag made using the code
fromother index AMs needlessly harder.  I am glad you see it the same way.
 

committed