Thread: extract text from XML
Hi, I have found a basic use case which is supported by the xml2 module, but is unsupported by the new XML API. It is not possible to correctly extract text (either from text nodes or attribute values) which contains the characters '<', '&', or '>'. xpath() (correctly) returns XML text nodes for queries targeting these node types, and there is no inverse to xmlelement(). For example: => select (xpath('/a/text()', xmlelement(name a, '<&>')))[1]::text; xpath -----------<&> (1 row) Again, not a bug; but there is no way to specify my desired intent. The xml2 module does provide such a function, xpath_string: => select xpath_string(xmlelement(name a, '<&>')::text, '/a/text()');xpath_string --------------<&> (1 row) One workaround is to return the node's text value by serializing the XML value, and textually replacing those three entities with the characters they represent, but this relies on the xpath() function not generating other entities. (My use case is importing data in XML format, and processing with Postgres into a relational format.) Perhaps a function xpath_value(text, xml) -> text[] would close the gap?(I did search and no such function seems to existcurrently, outside xml2.) Thanks, Chris
> I have found a basic use case which is supported by the xml2 module, > but is unsupported by the new XML API. > It is not possible to correctly extract text Indeed. I came accross this shortcomming some months ago myself but still manage an item on my ToDo list to report it hereas the deprecation notice at https://www.postgresql.org/docs/devel/static/xml2.html#AEN180625 asks for. Done, thanks;) I did some archive-browsing on that topic. The issue (if you want to call it that way) was introduced by an patch to ensurexpath() always returns xml, applied for 9.2 after some discussion: https://www.postgresql.org/message-id/201106291934.23089.rsmogura%40softperience.euand is since then known: https://www.postgresql.org/message-id/1409795403248-5817667.post%40n5.nabble.comThe new behaviour was later reported as abug and discussed again: https://www.postgresql.org/message-id/CAAY5AM1L83y79rtOZAUJioREO6n4%3DXAFKcGu6qO3hCZE1yJytg%40mail.gmail.com Anyhow - (un)escaping functions to support the text<->xml conversion are often talked about but still seem only to be foundin xml2 module. Seeing a xmltable implementing patch here recently, these functions would be another step to make thecontrib module obsolete, finally. > Perhaps a function xpath_value(text, xml) -> text[] would close the gap? such an design, resembling the xml2 behaviour, would certainly fit the need, imho. regards Tobias