Thread: Bug in error reporting for multi-line JSON

Bug in error reporting for multi-line JSON

From

Simon Riggs

Date:

20 January 2021, 06:58:09

JSON parsing reports the line number and relevant context info
incorrectly when the JSON contains newlines. Current code mostly just
says "LINE 1" and is misleading for error correction. There were no
tests for this previously.

Proposed changes mean a JSON error such as this
{
 "one": 1,
 "two":,"two",          <-- extra comma
 "three": true
}

was previously reported as

CONTEXT:  JSON data, line 1: {
"one": 1,
"two":,...

should be reported as

CONTEXT:  JSON data, line 3: "two":,...

Attached patches:
HEAD: json_error_context.v3.patch - applies cleanly, passes make check
PG13: json_error_context.v3.patch - applies w minor fuzz, passes make check
PG12: json_error_context.v3.PG12.patch - applies cleanly, passes make check
PG11: json_error_context.v3.PG12.patch - applies cleanly, not tested
PG10: json_error_context.v3.PG12.patch - applies cleanly, not tested
PG9.6: json_error_context.v3.PG12.patch - applies cleanly, not tested
PG9.5: json_error_context.v3.PG12.patch - applies cleanly, not tested

-- 
Simon Riggs                http://www.EnterpriseDB.com/

Attachment

Re: Bug in error reporting for multi-line JSON

From

Tom Lane

Date:

21 January 2021, 18:08:28

Simon Riggs <simon.riggs@enterprisedb.com> writes:
> JSON parsing reports the line number and relevant context info
> incorrectly when the JSON contains newlines. Current code mostly just
> says "LINE 1" and is misleading for error correction. There were no
> tests for this previously.

Couple thoughts:

* I think you are wrong to have removed the line number bump that
happened when report_json_context advances context_start over a
newline.  The case is likely harder to get to now, but it can still
happen can't it?  If it can't, we should remove that whole stanza.

* I'd suggest naming the new JsonLexContext field "pos_last_newline";
"linefeed" is not usually the word we use for this concept.  (Although
actually, it might work better if you make that point to the char
*after* the newline, in which case "last_linestart" might be the
right name.)

* I'm not enthused about back-patching.  This behavior seems like an
improvement, but that doesn't mean people will appreciate changing it
in stable branches.

            regards, tom lane

Re: Bug in error reporting for multi-line JSON

From

Simon Riggs

Date:

25 January 2021, 13:08:15

On Thu, Jan 21, 2021 at 6:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Simon Riggs <simon.riggs@enterprisedb.com> writes:
> > JSON parsing reports the line number and relevant context info
> > incorrectly when the JSON contains newlines. Current code mostly just
> > says "LINE 1" and is misleading for error correction. There were no
> > tests for this previously.
>
> Couple thoughts:
>
> * I think you are wrong to have removed the line number bump that
> happened when report_json_context advances context_start over a
> newline.  The case is likely harder to get to now, but it can still
> happen can't it?  If it can't, we should remove that whole stanza.

OK, I'm playing around with this to see what is needed.

> * I'd suggest naming the new JsonLexContext field "pos_last_newline";
> "linefeed" is not usually the word we use for this concept.  (Although
> actually, it might work better if you make that point to the char
> *after* the newline, in which case "last_linestart" might be the
> right name.)

Yes, OK

> * I'm not enthused about back-patching.  This behavior seems like an
> improvement, but that doesn't mean people will appreciate changing it
> in stable branches.

OK, as you wish.

Thanks for the review, will post again soon with an updated patch.

-- 
Simon Riggs                http://www.EnterpriseDB.com/

Re: Bug in error reporting for multi-line JSON

From

Hamid Akhtar

Date:

26 January 2021, 09:07:09

On Mon, Jan 25, 2021 at 6:08 PM Simon Riggs <simon.riggs@enterprisedb.com> wrote:

On Thu, Jan 21, 2021 at 6:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Simon Riggs <simon.riggs@enterprisedb.com> writes:
> > JSON parsing reports the line number and relevant context info
> > incorrectly when the JSON contains newlines. Current code mostly just
> > says "LINE 1" and is misleading for error correction. There were no
> > tests for this previously.
>
> Couple thoughts:
>
> * I think you are wrong to have removed the line number bump that
> happened when report_json_context advances context_start over a
> newline. The case is likely harder to get to now, but it can still
> happen can't it? If it can't, we should remove that whole stanza.

OK, I'm playing around with this to see what is needed.

> * I'd suggest naming the new JsonLexContext field "pos_last_newline";
> "linefeed" is not usually the word we use for this concept. (Although
> actually, it might work better if you make that point to the char
> *after* the newline, in which case "last_linestart" might be the
> right name.)

Yes, OK

> * I'm not enthused about back-patching. This behavior seems like an
> improvement, but that doesn't mean people will appreciate changing it
> in stable branches.

OK, as you wish.

Thanks for the review, will post again soon with an updated patch.

I agree with Tom's feedback.. Whether you change pos_last_linefeed to point to the character after the linefeed or not, we can still simplify the for loop within the "report_json_context" function to:

=================

context_start = lex->input + lex->pos_last_linefeed;
context_start += (*context_start == '\n'); /* Let's move beyond the linefeed */
context_end = lex->token_terminator;
line_start = context_start;
while (context_end - context_start >= 50 && context_start < context_end)
{
/* Advance to next multibyte character */
if (IS_HIGHBIT_SET(*context_start))
context_start += pg_mblen(context_start);
else
context_start++;
}
=================

IMHO, this should work as pos_last_linefeed points to the position of the last linefeed before the error occurred, hence we can safely skip it and move the start_context forward.

--
Simon Riggs http://www.EnterpriseDB.com/

Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL：+923335449950 EMAIL: mailto:hamid.akhtar@highgo.ca

SKYPE: engineeredvirus

RE: Bug in error reporting for multi-line JSON

From

"tanghy.fnst@fujitsu.com"

Date:

25 August 2021, 08:22:57

On Tuesday, March 2, 2021 6:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>Daniel Gustafsson <daniel@yesql.se> writes:
>> I'm changing the status of this patch to Ready for Committer.
>
>I reviewed this and pushed it, with some changes.

I think the while condition "context_start < context_end" added in commit ffd3944ab9 is useless. Thoughts?

code added in ffd3944ab9
+ while (context_end - context_start >= 50 && context_start < context_end)

Regards,
Tang

Re: Bug in error reporting for multi-line JSON

From

Daniel Gustafsson

Date:

25 August 2021, 08:56:31

> On 25 Aug 2021, at 10:22, tanghy.fnst@fujitsu.com wrote:
>
> On Tuesday, March 2, 2021 6:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Daniel Gustafsson <daniel@yesql.se> writes:
>>> I'm changing the status of this patch to Ready for Committer.
>>
>> I reviewed this and pushed it, with some changes.
>
> I think the while condition "context_start < context_end" added in commit ffd3944ab9 is useless. Thoughts?
>
> code added in ffd3944ab9
> + while (context_end - context_start >= 50 && context_start < context_end)

Judging by the diff it’s likely a leftover from the previous coding.  I don’t
see a case for when it would hit, but it also doesn’t seem to do any harm apart
from potentially causing static analyzers to get angry.

--
Daniel Gustafsson        https://vmware.com/

Re: Bug in error reporting for multi-line JSON

From

Tom Lane

Date:

25 August 2021, 14:55:52

Daniel Gustafsson <daniel@yesql.se> writes:
>> On 25 Aug 2021, at 10:22, tanghy.fnst@fujitsu.com wrote:
>> I think the while condition "context_start < context_end" added in commit ffd3944ab9 is useless. Thoughts?

> Judging by the diff it’s likely a leftover from the previous coding.  I don’t
> see a case for when it would hit, but it also doesn’t seem to do any harm apart
> from potentially causing static analyzers to get angry.

Yeah.  I think that while reviewing this patch I read the while-condition
as a range check on context_start, but it isn't --- both inequalities
are in the same direction.  I suppose there could be some quibble
about what happens if context_end - context_start is so large as to
overflow an integer, but that's never gonna happen (and if it did,
we'd have other issues, for instance the lack of any check-for-interrupt
in this loop).

Will fix.

            regards, tom lane