Re: [PATCH] Add pretty-printed XML output option - Mailing list pgsql-hackers
From | Jim Jones |
---|---|
Subject | Re: [PATCH] Add pretty-printed XML output option |
Date | |
Msg-id | 906cfd1d-63a0-3cf5-7291-2dadb2e149ff@uni-muenster.de Whole thread Raw |
In response to | Re: [PATCH] Add pretty-printed XML output option (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: [PATCH] Add pretty-printed XML output option
|
List | pgsql-hackers |
Thanks a lot for the review! On 09.03.23 21:21, Tom Lane wrote: > I've looked through this now, and have some minor complaints and a major > one. The major one is that it doesn't work for XML that doesn't satisfy > IS DOCUMENT. For example, > > regression=# select '<bar><val x="y">42</val></bar><foo></foo>'::xml is document; > ?column? > ---------- > f > (1 row) > > regression=# select xmlserialize (content '<bar><val x="y">42</val></bar><foo></foo>' as text); > xmlserialize > ------------------------------------------- > <bar><val x="y">42</val></bar><foo></foo> > (1 row) > > regression=# select xmlserialize (content '<bar><val x="y">42</val></bar><foo></foo>' as text indent); > ERROR: invalid XML document > DETAIL: line 1: Extra content at the end of the document > <bar><val x="y">42</val></bar><foo></foo> > ^ I assumed it should fail because the XML string doesn't have a singly-rooted XML. Oracle has this feature implemented and it does not seem to allow non singly-rooted strings either[1]. Also, some the tools I use also fail in this case[2,3] How do you suggest the output should look like? Does the SQL spec also define it? I can't find it online :( > This is not what the documentation promises, and I don't think it's > good enough --- the SQL spec has no restriction saying you can't > use INDENT with CONTENT. I tried adjusting things so that we call > xml_parse() with the appropriate DOCUMENT or CONTENT xmloption flag, > but all that got me was empty output (except for a document header). > It seems like xmlDocDumpFormatMemory is not the thing to use, at least > not in the CONTENT case. But libxml2 has a few other "dump" > functions, so maybe we can use a different one? I see we are using > xmlNodeDump elsewhere, and that has a format option, so maybe there's > a way forward there. > > A lesser issue is that INDENT tacks on a document header (XML declaration) > whether there was one or not. I'm not sure whether that's an appropriate > thing to do in the DOCUMENT case, but it sure seems weird in the CONTENT > case. We have code that can strip off the header again, but we > need to figure out exactly when to apply it. I replaced xmlDocDumpFormatMemory with xmlSaveToBuffer and used to option XML_SAVE_NO_DECL for input docs with XML declaration. It no longer returns a XML declaration if the input doc does not contain it. > I also suspect that it's outright broken to attach a header claiming > the data is now in UTF8 encoding. If the database encoding isn't > UTF8, then either that's a lie or we now have an encoding violation. I was mistakenly calling xml_parse with GetDatabaseEncoding(). It now uses the encoding of the given doc and UTF8 if not provided. > Another thing that's mildly irking me is that the current > factorization of this code will result in xml_parse'ing the data > twice, if you have both DOCUMENT and INDENT specified. We could > consider avoiding that if we merged the indentation functionality > into xmltotext_with_xmloption, but it's probably premature to do so > when we haven't figured out how to get the output right --- we might > end up needing two xml_parse calls anyway with different parameters, > perhaps. > > I also had a bunch of cosmetic complaints (mostly around this having > a bad case of add-at-the-end-itis), which I've cleaned up in the > attached v20. This doesn't address any of the above, however. I swear to god I have no idea what "add-at-the-end-itis" means :) > regards, tom lane Thanks a lot! Best, Jim 1 - https://dbfiddle.uk/WUOWtjBU 2 - https://www.samltool.com/prettyprint.php 3 - https://xmlpretty.com/xmlpretty
Attachment
pgsql-hackers by date: