Thread: [PROTOCOL TODO] Permit streaming of unknown-length lob/clob (bytea,text,etc)
[PROTOCOL TODO] Permit streaming of unknown-length lob/clob (bytea,text,etc)
From
Craig Ringer
Date:
Hi all Currently the client must know the size of a large lob/clob field, like a 'bytea' or 'text' field, in order to send it to the server. This can force the client to buffer all the data before sending it to the server. It would be helpful if the v4 protocol permitted the client to specify the field length as unknown / TBD, then stream data until an end marker is read. Some encoding would be required for binary data to ensure that occurrences of the end marker in the streamed data were properly handled, but there are many well established schemes for doing this. I'm aware that this is possible for pg_largeobject, but this is with reference to big varlena fields. This would be a useful change to have in connection with the already-TODO'd lazy fetching of large TOASTed values, as part of a general improvement in Pg's handling of big values in tuples. Thoughts/comments? -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: [PROTOCOL TODO] Permit streaming of unknown-length lob/clob (bytea,text,etc)
From
David Fetter
Date:
On Mon, Dec 01, 2014 at 02:55:22PM +0800, Craig Ringer wrote: > Hi all > > Currently the client must know the size of a large lob/clob field, like > a 'bytea' or 'text' field, in order to send it to the server. This can > force the client to buffer all the data before sending it to the server. Yes, this is not good. > It would be helpful if the v4 protocol permitted the client to specify > the field length as unknown / TBD, then stream data until an end marker > is read. What's wrong with specifying its length in advance instead? Are you thinking of a one or more use cases where it's both large and unknown? Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: [PROTOCOL TODO] Permit streaming of unknown-length lob/clob (bytea,text,etc)
From
Craig Ringer
Date:
On 12/01/2014 10:38 PM, David Fetter wrote: > On Mon, Dec 01, 2014 at 02:55:22PM +0800, Craig Ringer wrote: >> Hi all >> >> Currently the client must know the size of a large lob/clob field, like >> a 'bytea' or 'text' field, in order to send it to the server. This can >> force the client to buffer all the data before sending it to the server. > > Yes, this is not good. > >> It would be helpful if the v4 protocol permitted the client to specify >> the field length as unknown / TBD, then stream data until an end marker >> is read. > > What's wrong with specifying its length in advance instead? Are you > thinking of a one or more use cases where it's both large and unknown? I am - specifically, the JDBC setBlob(...) and setClob(...) APIs that accept streams without a specified length: https://docs.oracle.com/javase/7/docs/api/java/sql/PreparedStatement.html#setBlob(int,%20java.io.InputStream) https://docs.oracle.com/javase/7/docs/api/java/sql/PreparedStatement.html#setClob(int,%20java.io.Reader) There are variants that do take a length, so PgJDBC can (and now does) implement the no-length variants by internally buffering the stream until EOF. It'd be nice to get rid of that though. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: [PROTOCOL TODO] Permit streaming of unknown-length lob/clob (bytea,text,etc)
From
Tom Lane
Date:
Craig Ringer <craig@2ndquadrant.com> writes: > Currently the client must know the size of a large lob/clob field, like > a 'bytea' or 'text' field, in order to send it to the server. This can > force the client to buffer all the data before sending it to the server. > It would be helpful if the v4 protocol permitted the client to specify > the field length as unknown / TBD, then stream data until an end marker > is read. Some encoding would be required for binary data to ensure that > occurrences of the end marker in the streamed data were properly > handled, but there are many well established schemes for doing this. I think this is pretty much a non-starter as stated, because the v3 protocol requires all messages to have a preceding length word. That's not very negotiable. What's already on the TODO list is to allow large field values to be sent or received in segments, perhaps with a cursor-like arrangement. You can do that today for blobs, but not for oversize regular table fields. Of course, considering that the maximum practical size of a regular field is probably in the dozens of megabytes, and that RAM is getting cheaper all the time, it's not clear that it's all that much of a hardship for clients to buffer the whole thing. If we've not gotten around to this in the last dozen years, it's unlikely we'll get to it in the future either ... regards, tom lane