Thread: Add a bound check to TidRangeEval

Add a bound check to TidRangeEval

From
Junwang Zhao
Date:
Hi hackers,

The comments of TidRangeEval saids:

```
Returns false if we detect the range cannot contain any tuples.
```

After narrowing the upper and lower bounds, we can add an
additional check to verify if the lower bound exceeds the
upper bound. If true, return false to avoid unnecessary calls
to table_beginscan_tidrange.

PFA the trivial patch.

-- 
Regards
Junwang Zhao

Attachment

Re: Add a bound check to TidRangeEval

From
David Rowley
Date:
On Sun, 8 Jun 2025 at 21:41, Junwang Zhao <zhjwpku@gmail.com> wrote:
> The comments of TidRangeEval saids:
>
> ```
> Returns false if we detect the range cannot contain any tuples.
> ```
>
> After narrowing the upper and lower bounds, we can add an
> additional check to verify if the lower bound exceeds the
> upper bound. If true, return false to avoid unnecessary calls
> to table_beginscan_tidrange.

Do you think this is common enough to bother doing?  It's not like
this is going on in the planner and allows further optimisations in
regards to row estimates to assist in join order decisions.

Also, if you're going to propose a performance optimisation, then it
should also come with some benchmark code and results to demonstrate
that there is a gain in performance. I don't think it should be up to
the reviewer or committer to do that for you. I imagine it's not
exciting enough to bother with, but feel free to prove that I'm wrong.

David



Re: Add a bound check to TidRangeEval

From
Junwang Zhao
Date:
Hi David,

On Mon, Jun 9, 2025 at 9:52 AM David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Sun, 8 Jun 2025 at 21:41, Junwang Zhao <zhjwpku@gmail.com> wrote:
> > The comments of TidRangeEval saids:
> >
> > ```
> > Returns false if we detect the range cannot contain any tuples.
> > ```
> >
> > After narrowing the upper and lower bounds, we can add an
> > additional check to verify if the lower bound exceeds the
> > upper bound. If true, return false to avoid unnecessary calls
> > to table_beginscan_tidrange.
>
> Do you think this is common enough to bother doing?  It's not like
> this is going on in the planner and allows further optimisations in
> regards to row estimates to assist in join order decisions.

This is not a common case, it's just a corner case while
playing around the TidRangeScan.

I'm not saying this is an optimization, it just makes me a little
confused when I see the lowerBound > upperBound and
it still returns true.

The comments say "Returns true if it's possible for the
range to contain tuples." This seems not fully accurate?

The case will be handled later while scanning the pages,
but why not handle it earlier if we already know there is no
chance of the range containing any tuples.

>
> Also, if you're going to propose a performance optimisation, then it
> should also come with some benchmark code and results to demonstrate
> that there is a gain in performance. I don't think it should be up to
> the reviewer or committer to do that for you. I imagine it's not
> exciting enough to bother with, but feel free to prove that I'm wrong.

This is not a performance optimization, so I never try to come
up with any benchmark. Sorry if I made you feel that way.

>
> David



--
Regards
Junwang Zhao



Re: Add a bound check to TidRangeEval

From
Junwang Zhao
Date:
Hi David,

On Sat, Jun 14, 2025 at 11:27 AM David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Wed, 11 Jun 2025 at 14:26, Junwang Zhao <zhjwpku@gmail.com> wrote:
> > This is not a common case, it's just a corner case while
> > playing around the TidRangeScan.
> >
> > I'm not saying this is an optimization, it just makes me a little
> > confused when I see the lowerBound > upperBound and
> > it still returns true.
> >
> > The comments say "Returns true if it's possible for the
> > range to contain tuples." This seems not fully accurate?
>
> The word "possible" there is meant to convey that false negatives are
> *not* ok, but false positives are fine. I'm certainly open to making
> the comment better if that's not clear.
>
> How about adding "We don't bother validating that trss_mintid is less
> than or equal to trss_maxtid, as the scan_set_tidrange() table AM
> function will detect that."

Sounds good to me. This will naturally direct readers to the
comments in scan_set_tidrange.

>
> The scan_set_tidrange function does document that it's the
> implementation's responsibility to check this, as the comment in the
> declaration of struct TableAmRoutine reads:
>
> * Implementations of scan_set_tidrange must themselves handle
> * ItemPointers of any value. i.e, they must handle each of the following:
> *
> * 1) mintid or maxtid is beyond the end of the table; and
> * 2) mintid is above maxtid; and
> * 3) item offset for mintid or maxtid is beyond the maximum offset
> * allowed by the AM.
>
> So I'm not keen to add code to TidRangeEval to check for this as it
> would result in additional overhead for valid usages of TID Range
> scans.

Agree, thanks for the explanation.

>
> David



--
Regards
Junwang Zhao



Re: Add a bound check to TidRangeEval

From
David Rowley
Date:
On Sat, 14 Jun 2025 at 16:10, Junwang Zhao <zhjwpku@gmail.com> wrote:
>
> On Sat, Jun 14, 2025 at 11:27 AM David Rowley <dgrowleyml@gmail.com> wrote:
> > How about adding "We don't bother validating that trss_mintid is less
> > than or equal to trss_maxtid, as the scan_set_tidrange() table AM
> > function will detect that."
>
> Sounds good to me. This will naturally direct readers to the
> comments in scan_set_tidrange.

Thanks for looking. Pushed, after changing "detect" into "handle".

David