On Wed, Jun 11, 2025 at 11:04:36AM +0200, Anthonin Bonnefoy wrote:
> Here's a v2 for the attempted fix. This is using the mentioned approach:
> - SelectStmt's location is set to Select's position when the node if built
> - If there are surrounding parentheses, we include them and move the
> start of the statement to the outermost '('
>
> The Statement length is only set for:
> - COPY: We use the PreparableStatement's surrounding '()' to set both
> location and length (setting location should be redundant here though)
> - CTAS: we set the statement's length if there's a tailing 'WITH (NO) DATA'.
> - Other than those, the statement length will fallback to use the top
> RawStmt's remaining length.
@@ -3417,9 +3418,9 @@ CopyStmt: COPY opt_binary qualified_name opt_column_list
[...]
- updatePreparableStmtEnd($3, @2, @4 - @2);
+ updatePreparableStmtEnd($3, @3, @4 - @3);
For the case of COPY, it is possible to get a much better report by
setting the location and length to not include the first parenthesis,
like that.
> I haven't done it for Views and JSON_ARRAY. Their nested queries
> aren't currently tracked and reported by pgss so there's no way to
> test those.
I have to admit that it is inconsistent to set a location in their
respective inner nodes while we don't use updateSelectStmtEnd() to
adjust the length based on the location assigned by the parser.
There's a risk of having a code path in the future looking at the
location later on based on the parsed state and taking a wrong
decision because the location and the length do not match. Such cases
could be easier to reach outside of core, for example imagine an
extension looking at a ViewStmt where we set the location but not a
matching length. We would consider as inner query a string that
begins at the beginning of the location until the end of the string
because the length is not set to match with what the parser finds.
> This could be done but there's no available way to test
> this is correct AFAIK?
Yes, perhaps we should think harder about this part. There are two
patterns among a few others like UNION, which can also be used in
views.
Anyway, I am going to pull the plug on this one today, because we need
to take a decision with beta2 approaching fast but it's mostly me
being not fully confident that this is right as-is. If we're wrong in
one path, that's an out-of-bound memory write waiting for us in PGSS
for the query normalization, as this report proves, and that would be
bad.
The first step would be to improve the main regression test suite for
all these grammar "parenthesis" patterns we have never had coverage
for in core. I am wondering if it is actually possible to reach an
inconsistent state even in older stable branches because we've never
tested all these grammar patterns. Perhaps I'm wrong, but we would be
more confident with all these extra patterns stressed first by
027_stream_regress.pl which forces queries to have some normalization
applied.
--
Michael