Thread: Bug in error reporting for multi-line JSON
JSON parsing reports the line number and relevant context info incorrectly when the JSON contains newlines. Current code mostly just says "LINE 1" and is misleading for error correction. There were no tests for this previously. Proposed changes mean a JSON error such as this { "one": 1, "two":,"two", <-- extra comma "three": true } was previously reported as CONTEXT: JSON data, line 1: { "one": 1, "two":,... should be reported as CONTEXT: JSON data, line 3: "two":,... Attached patches: HEAD: json_error_context.v3.patch - applies cleanly, passes make check PG13: json_error_context.v3.patch - applies w minor fuzz, passes make check PG12: json_error_context.v3.PG12.patch - applies cleanly, passes make check PG11: json_error_context.v3.PG12.patch - applies cleanly, not tested PG10: json_error_context.v3.PG12.patch - applies cleanly, not tested PG9.6: json_error_context.v3.PG12.patch - applies cleanly, not tested PG9.5: json_error_context.v3.PG12.patch - applies cleanly, not tested -- Simon Riggs http://www.EnterpriseDB.com/
Attachment
Simon Riggs <simon.riggs@enterprisedb.com> writes: > JSON parsing reports the line number and relevant context info > incorrectly when the JSON contains newlines. Current code mostly just > says "LINE 1" and is misleading for error correction. There were no > tests for this previously. Couple thoughts: * I think you are wrong to have removed the line number bump that happened when report_json_context advances context_start over a newline. The case is likely harder to get to now, but it can still happen can't it? If it can't, we should remove that whole stanza. * I'd suggest naming the new JsonLexContext field "pos_last_newline"; "linefeed" is not usually the word we use for this concept. (Although actually, it might work better if you make that point to the char *after* the newline, in which case "last_linestart" might be the right name.) * I'm not enthused about back-patching. This behavior seems like an improvement, but that doesn't mean people will appreciate changing it in stable branches. regards, tom lane
On Thu, Jan 21, 2021 at 6:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Simon Riggs <simon.riggs@enterprisedb.com> writes: > > JSON parsing reports the line number and relevant context info > > incorrectly when the JSON contains newlines. Current code mostly just > > says "LINE 1" and is misleading for error correction. There were no > > tests for this previously. > > Couple thoughts: > > * I think you are wrong to have removed the line number bump that > happened when report_json_context advances context_start over a > newline. The case is likely harder to get to now, but it can still > happen can't it? If it can't, we should remove that whole stanza. OK, I'm playing around with this to see what is needed. > * I'd suggest naming the new JsonLexContext field "pos_last_newline"; > "linefeed" is not usually the word we use for this concept. (Although > actually, it might work better if you make that point to the char > *after* the newline, in which case "last_linestart" might be the > right name.) Yes, OK > * I'm not enthused about back-patching. This behavior seems like an > improvement, but that doesn't mean people will appreciate changing it > in stable branches. OK, as you wish. Thanks for the review, will post again soon with an updated patch. -- Simon Riggs http://www.EnterpriseDB.com/
On Mon, Jan 25, 2021 at 6:08 PM Simon Riggs <simon.riggs@enterprisedb.com> wrote:
On Thu, Jan 21, 2021 at 6:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Simon Riggs <simon.riggs@enterprisedb.com> writes:
> > JSON parsing reports the line number and relevant context info
> > incorrectly when the JSON contains newlines. Current code mostly just
> > says "LINE 1" and is misleading for error correction. There were no
> > tests for this previously.
>
> Couple thoughts:
>
> * I think you are wrong to have removed the line number bump that
> happened when report_json_context advances context_start over a
> newline. The case is likely harder to get to now, but it can still
> happen can't it? If it can't, we should remove that whole stanza.
OK, I'm playing around with this to see what is needed.
> * I'd suggest naming the new JsonLexContext field "pos_last_newline";
> "linefeed" is not usually the word we use for this concept. (Although
> actually, it might work better if you make that point to the char
> *after* the newline, in which case "last_linestart" might be the
> right name.)
Yes, OK
> * I'm not enthused about back-patching. This behavior seems like an
> improvement, but that doesn't mean people will appreciate changing it
> in stable branches.
OK, as you wish.
Thanks for the review, will post again soon with an updated patch.
I agree with Tom's feedback.. Whether you change pos_last_linefeed to point to the character after the linefeed or not, we can still simplify the for loop within the "report_json_context" function to:
=================
=================
context_start = lex->input + lex->pos_last_linefeed;
context_start += (*context_start == '\n'); /* Let's move beyond the linefeed */
context_end = lex->token_terminator;
line_start = context_start;
while (context_end - context_start >= 50 && context_start < context_end)
{
/* Advance to next multibyte character */
if (IS_HIGHBIT_SET(*context_start))
context_start += pg_mblen(context_start);
else
context_start++;
}
=================
context_start += (*context_start == '\n'); /* Let's move beyond the linefeed */
context_end = lex->token_terminator;
line_start = context_start;
while (context_end - context_start >= 50 && context_start < context_end)
{
/* Advance to next multibyte character */
if (IS_HIGHBIT_SET(*context_start))
context_start += pg_mblen(context_start);
else
context_start++;
}
=================
IMHO, this should work as pos_last_linefeed points to the position of the last linefeed before the error occurred, hence we can safely skip it and move the start_context forward.
--
Simon Riggs http://www.EnterpriseDB.com/
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL:+923335449950 EMAIL: mailto:hamid.akhtar@highgo.ca
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL:+923335449950 EMAIL: mailto:hamid.akhtar@highgo.ca
SKYPE: engineeredvirus
On Tuesday, March 2, 2021 6:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >Daniel Gustafsson <daniel@yesql.se> writes: >> I'm changing the status of this patch to Ready for Committer. > >I reviewed this and pushed it, with some changes. I think the while condition "context_start < context_end" added in commit ffd3944ab9 is useless. Thoughts? code added in ffd3944ab9 + while (context_end - context_start >= 50 && context_start < context_end) Regards, Tang
> On 25 Aug 2021, at 10:22, tanghy.fnst@fujitsu.com wrote: > > On Tuesday, March 2, 2021 6:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Daniel Gustafsson <daniel@yesql.se> writes: >>> I'm changing the status of this patch to Ready for Committer. >> >> I reviewed this and pushed it, with some changes. > > I think the while condition "context_start < context_end" added in commit ffd3944ab9 is useless. Thoughts? > > code added in ffd3944ab9 > + while (context_end - context_start >= 50 && context_start < context_end) Judging by the diff it’s likely a leftover from the previous coding. I don’t see a case for when it would hit, but it also doesn’t seem to do any harm apart from potentially causing static analyzers to get angry. -- Daniel Gustafsson https://vmware.com/
Daniel Gustafsson <daniel@yesql.se> writes: >> On 25 Aug 2021, at 10:22, tanghy.fnst@fujitsu.com wrote: >> I think the while condition "context_start < context_end" added in commit ffd3944ab9 is useless. Thoughts? > Judging by the diff it’s likely a leftover from the previous coding. I don’t > see a case for when it would hit, but it also doesn’t seem to do any harm apart > from potentially causing static analyzers to get angry. Yeah. I think that while reviewing this patch I read the while-condition as a range check on context_start, but it isn't --- both inequalities are in the same direction. I suppose there could be some quibble about what happens if context_end - context_start is so large as to overflow an integer, but that's never gonna happen (and if it did, we'd have other issues, for instance the lack of any check-for-interrupt in this loop). Will fix. regards, tom lane