Thread: [HACKERS] Built-in plugin for logical decoding output
We have been working on a project that makes extensive use of logical replication for use inside Apigee (which is a very small part of Google):
In order to do this, we had to write our own logical replication plugin because the supplied "test_decoding" plugin from the "contrib" directory was hard for us to work with. Primarily:
1) It doesn't include all the fields that we need for Transicator (most importantly we need the LSN and the 64-bit transaction ID).
2) It outputs a text format that is hard to parse.
I imagine that other users of logical decoding are facing similar problems.
Would the community support the development of another plugin that is distributed as part of "contrib" that addresses these issues? I'd be happy to submit a patch, or GitHub repo, or whatever works best as an example. (Also, although Transicator uses protobuf, I'm happy to have it output a simple binary format as well.)
As a side note, doing this would also help making logical decoding a useful feature for customers of Amazon and Google's built-in Postgres hosting options. In those environments, there's no way to add a custom plugin to Postgres, so anything not built in the product tends to be harder for someone to consume.
If anyone is interested in looking more:
The plugin code is here:
and produces output defined by the "ChangePb" structure defined here:
On 23/09/17 00:28, Gregory Brail wrote: > We have been working on a project that makes extensive use of logical > replication for use inside Apigee (which is a very small part of Google): > > https://github.com/apigee-labs/transicator > > In order to do this, we had to write our own logical replication > plugin because the supplied "test_decoding" plugin from the "contrib" > directory was hard for us to work with. Primarily: > > 1) It doesn't include all the fields that we need for Transicator > (most importantly we need the LSN and the 64-bit transaction ID). > 2) It outputs a text format that is hard to parse. > > I imagine that other users of logical decoding are facing similar > problems. > > Would the community support the development of another plugin that is > distributed as part of "contrib" that addresses these issues? I'd be > happy to submit a patch, or GitHub repo, or whatever works best as an > example. (Also, although Transicator uses protobuf, I'm happy to have > it output a simple binary format as well.) > > As a side note, doing this would also help making logical decoding a > useful feature for customers of Amazon and Google's built-in Postgres > hosting options. In those environments, there's no way to add a custom > plugin to Postgres, so anything not built in the product tends to be > harder for someone to consume. > > If anyone is interested in looking more: > > The plugin code is here: > https://github.com/apigee-labs/transicator/tree/master/pgoutput > > and produces output defined by the "ChangePb" structure defined here: > https://github.com/apigee-labs/transicator/blob/master/common/transicator.proto How about using pgoutput, which is included by default in PostgreSQL 10, as the basis for logical replication? Cheers, Álvaro -- Alvaro Hernandez ----------- OnGres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Thanks! I didn't read the version 10 RC docs carefully enough.
I found the code that generates the protocol message (looks like its replication/proto/logical.c). Are there docs somewhere on the format, or is it just the code?
Also in lieu of the new snapshot mechanism for logical replication, which might not work for us, we were using the transaction ID to calculate what was committed in a client's snapshot and what they need to apply to their own local database. That relied on the transaction ID, and we wanted to use a 64-bit ID so that we could handle rollover. We ended up doing this:
It looks to me like the new stuff only puts a 32-bit "xid" in there. Would there be a way to include the epoch as well? (And yes, I realize it might require a more detailed explanation which I'm happy to put together.)
On Fri, Sep 22, 2017 at 4:01 PM, Alvaro Hernandez <aht@ongres.com> wrote:
How about using pgoutput, which is included by default in PostgreSQL 10, as the basis for logical replication?
On 23/09/17 00:28, Gregory Brail wrote:We have been working on a project that makes extensive use of logical replication for use inside Apigee (which is a very small part of Google):
https://github.com/apigee-labs/transicator
In order to do this, we had to write our own logical replication plugin because the supplied "test_decoding" plugin from the "contrib" directory was hard for us to work with. Primarily:
1) It doesn't include all the fields that we need for Transicator (most importantly we need the LSN and the 64-bit transaction ID).
2) It outputs a text format that is hard to parse.
I imagine that other users of logical decoding are facing similar problems.
Would the community support the development of another plugin that is distributed as part of "contrib" that addresses these issues? I'd be happy to submit a patch, or GitHub repo, or whatever works best as an example. (Also, although Transicator uses protobuf, I'm happy to have it output a simple binary format as well.)
As a side note, doing this would also help making logical decoding a useful feature for customers of Amazon and Google's built-in Postgres hosting options. In those environments, there's no way to add a custom plugin to Postgres, so anything not built in the product tends to be harder for someone to consume.
If anyone is interested in looking more:
The plugin code is here:
https://github.com/apigee-labs/transicator/tree/master/pgout put
and produces output defined by the "ChangePb" structure defined here:
https://github.com/apigee-labs/transicator/blob/master/commo n/transicator.proto
Cheers,
Álvaro
--
Alvaro Hernandez
-----------
OnGres
Hi, On 2017-09-22 17:11:47 -0700, Gregory Brail wrote: > Also in lieu of the new snapshot mechanism for logical replication, which > might not work for us This needs context... >, we were using the transaction ID to calculate what > was committed in a client's snapshot and what they need to apply to their > own local database. That relied on the transaction ID, and we wanted to use > a 64-bit ID so that we could handle rollover. We ended up doing this: > > https://github.com/apigee-labs/transicator/blob/2d5dc596a5f2f5e13967e0f1943f248d88eac1e7/pgoutput/transicator_output.c#L151 > > It looks to me like the new stuff only puts a 32-bit "xid" in there. Would > there be a way to include the epoch as well? (And yes, I realize it might > require a more detailed explanation which I'm happy to put together.) It'd be good to see some more detail here, indeed. Especially if you could look at what pgoutput provides, and whether that's sufficient. I'm not entirely sure how much we want to make pgoutput configurable, in contrast to adding something that's intended to be very configurable at the price of some performance and bandwidth... Regards, Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
2017-09-22 19:28 GMT-03:00 Gregory Brail <gregbrail@google.com>: > We have been working on a project that makes extensive use of logical > replication for use inside Apigee (which is a very small part of Google): > > https://github.com/apigee-labs/transicator > > In order to do this, we had to write our own logical replication plugin > because the supplied "test_decoding" plugin from the "contrib" directory was > hard for us to work with. Primarily: > test_decoding is a proof of concept to illustrate the logical decoding potential. It is not intended for production. I developed wal2json [1] for general use. It outputs changes in JSON. It was one of the first logical decoding plugins. > 1) It doesn't include all the fields that we need for Transicator (most > importantly we need the LSN and the 64-bit transaction ID). > wal2json includes both. > 2) It outputs a text format that is hard to parse. > There are a lot of JSON parsers. > I imagine that other users of logical decoding are facing similar problems. > > Would the community support the development of another plugin that is > distributed as part of "contrib" that addresses these issues? I'd be happy > to submit a patch, or GitHub repo, or whatever works best as an example. > (Also, although Transicator uses protobuf, I'm happy to have it output a > simple binary format as well.) > There was a prior discussion and it was suggestted that we have a ready-for-production plugin in core (besides pgoutput). It was suggested [1] that I submit wal2json for 11. I'm in process to clean up the code and hope to submit it to CF2. [1] https://github.com/eulerto/wal2json [2] https://www.postgresql.org/message-id/CAHE3wggh6ucSCB%2BhnSK04xEQx75f3DTz0wPSjRMJFjum6pRrPQ%40mail.gmail.com -- Euler Taveira Timbira - http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 23/09/17 18:42, Euler Taveira wrote: > 2017-09-22 19:28 GMT-03:00 Gregory Brail <gregbrail@google.com>: >> We have been working on a project that makes extensive use of logical >> replication for use inside Apigee (which is a very small part of Google): >> >> https://github.com/apigee-labs/transicator >> >> In order to do this, we had to write our own logical replication plugin >> because the supplied "test_decoding" plugin from the "contrib" directory was >> hard for us to work with. Primarily: >> > test_decoding is a proof of concept to illustrate the logical decoding > potential. It is not intended for production. However, AFAIK, AWS's DMS uses it for production purposes (see http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_PostgreSQL.html). > I developed wal2json [1] > for general use. It outputs changes in JSON. It was one of the first > logical decoding plugins. > >> 1) It doesn't include all the fields that we need for Transicator (most >> importantly we need the LSN and the 64-bit transaction ID). >> > wal2json includes both. > >> 2) It outputs a text format that is hard to parse. >> > There are a lot of JSON parsers. > >> I imagine that other users of logical decoding are facing similar problems. >> >> Would the community support the development of another plugin that is >> distributed as part of "contrib" that addresses these issues? I'd be happy >> to submit a patch, or GitHub repo, or whatever works best as an example. >> (Also, although Transicator uses protobuf, I'm happy to have it output a >> simple binary format as well.) >> > There was a prior discussion and it was suggestted that we have a > ready-for-production plugin in core (besides pgoutput). It was > suggested [1] that I submit wal2json for 11. I'm in process to clean > up the code and hope to submit it to CF2. I would be happy to see another logical decoding plugin into core starting on 11. However, this also poses a bit of a challenge for middleware implementors: you need to support one for 9.4-9.5 (test_decoding), another for 10 (pgoutput) and maybe another for 11 onwards. The idea of asking users to install a binary plugin is very unsexy, so these are the options available. However, having said that, and while json is a great output format for interoperability, if there's a discussion on which plugin to include next, I'd also favor one that has some more compact representation format (or that supports several formats, not only json). Regards, Álvaro -- Alvaro Hernandez ----------- OnGres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
2017-09-23 14:01 GMT-03:00 Alvaro Hernandez <aht@ongres.com>: > However, AFAIK, AWS's DMS uses it for production purposes (see > http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_PostgreSQL.html). > It seems a bad idea. AFAICS test_decoding was not designed to be a ready-for-production plugin. It is just a proof of concept for logical decoding. > I would be happy to see another logical decoding plugin into core > starting on 11. However, this also poses a bit of a challenge for middleware > implementors: you need to support one for 9.4-9.5 (test_decoding), another > for 10 (pgoutput) and maybe another for 11 onwards. The idea of asking users > to install a binary plugin is very unsexy, so these are the options > available. > wal2json works for 9.4+ (besides the WAL messages I committed a month ago). Since this boat was already shipped we can arrange some packages for 9.4-10 (an external project) and ask vendors to support the backward-compatible plugin. The middleware implementor will have to support this new plugin format. Being JSON a widespread format, it is easier to refactor the code to parse JSON. > However, having said that, and while json is a great output format for > interoperability, if there's a discussion on which plugin to include next, > I'd also favor one that has some more compact representation format (or that > supports several formats, not only json). > We could certainly extend pgoutput to support more than one format (like pglogical did AFAIR), however, we wouldn't reuse code (different formats) and will have a fat plugin (I don't foresee a plugin using different formats in the same connection. It is difficult to coordinate a change like that having only one-way communication). -- Euler Taveira Timbira - http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 24/09/17 02:41, Euler Taveira wrote: > 2017-09-23 14:01 GMT-03:00 Alvaro Hernandez <aht@ongres.com>: >> However, AFAIK, AWS's DMS uses it for production purposes (see >> http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_PostgreSQL.html). >> > It seems a bad idea. AFAICS test_decoding was not designed to be a > ready-for-production plugin. It is just a proof of concept for logical > decoding. Yes, this is what I heard and read. However, if DMS uses it for what I'd call production use, I assume it is actually production quality. I bet they do enough testing, and don't ship software to potentially millions of customers if it doesn't work well. So... first, I'd consider this a a sign of robustness. Second..... my hats off for the plugin code ;) >> I would be happy to see another logical decoding plugin into core >> starting on 11. However, this also poses a bit of a challenge for middleware >> implementors: you need to support one for 9.4-9.5 (test_decoding), another >> for 10 (pgoutput) and maybe another for 11 onwards. The idea of asking users >> to install a binary plugin is very unsexy, so these are the options >> available. >> > wal2json works for 9.4+ (besides the WAL messages I committed a month > ago). Since this boat was already shipped we can arrange some packages > for 9.4-10 (an external project) and ask vendors to support the > backward-compatible plugin. The middleware implementor will have to > support this new plugin format. Being JSON a widespread format, it is > easier to refactor the code to parse JSON. I agree its far better to parse JSON than the test_decoding output. But asking any potential user to install a dynamic library, from a third party website, which will need to be compiled for many potential OSes/Archs, or even impossible if running on a managed environment... is not a great experience. Unless PostgreSQL would backport a plugin and ship it in newer releases, if test_decoding is good enough, I'd rather stick to it. > >> However, having said that, and while json is a great output format for >> interoperability, if there's a discussion on which plugin to include next, >> I'd also favor one that has some more compact representation format (or that >> supports several formats, not only json). >> > We could certainly extend pgoutput to support more than one format > (like pglogical did AFAIR), however, we wouldn't reuse code (different > formats) and will have a fat plugin (I don't foresee a plugin using > different formats in the same connection. It is difficult to > coordinate a change like that having only one-way communication). > I think pgoutput is what it is. Maybe instead than growing it, my +1 would be to add a new plugin that rather than being json only, would also support other formats, like an efficient binary serialization. Álvaro -- Alvaro Hernandez ----------- OnGres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 23 September 2017 at 06:28, Gregory Brail <gregbrail@google.com> wrote:
Would the community support the development of another plugin that is distributed as part of "contrib" that addresses these issues?
Petr Jelinek and I tried just that with pglogical. Our submission was knocked back with the complaint that there was no in-core user of the code, and it couldn't be evaluated usefully without an in-core consumer/receiver.
It's possible we'd make more progress if we tried again now, since we could probably write a test suite using the TAP test framework and a small src/test/modules consumer. But now we'd probably instead get blocked with the complaint that the output plugin used for logical replication should be sufficient for any reasonable need. I anticipate that we'd have some disagreements about what a reasonable need is, but ... *shrug*.
I personally think we _should_ have such a thing, and that it should be separate to the logical replication plugin to allow us to evolve that without worrying about out of core dependencies etc.
There's some common functionality that needs factoring out into the logical decoding framework, like some sort of relation metadata cache, some concept of "replication sets" or a set of tables to include/exclude, etc. Doing that is non-trivial work, but it's unlikely that two plugins with similar and overlapping implementations of such things would be accepted; in that case I'd be firmly in the "no" camp too.
Code in Pg has a cost, and we do have to justify that cost when we drop things in contrib/. It's not a free slush pile. So a solid argument does need to be made for why having this module living in github/whatever isn't good enough.
I'd be happy to submit a patch, or GitHub repo, or whatever works best as an example. (Also, although Transicator uses protobuf, I'm happy to have it output a simple binary format as well.)
PostgreSQL tends to be very, very conservative about dependencies and favours (not-)-invented-here rather heavily. Optional dependencies are accepted sometimes when they can be neatly isolated to one portion of the codebase and/or abstracted away, so it's not impossible you'd get acceptance for something like protocol buffers. But there's pretty much zero chance you'll get it as a hard dependency, you'll need a simple text and binary protocol too.
At which point the question will arise, why aren't these 3 separate output plugins? The text one, the binary one for in-core and the protobuf one to be maintained out of core.
That's a pretty sensible question. The answer is that they'll all need to share quite a bit of common infrastructure. But if that's infrastructure all plugins need, shouldn't it be pushed "up" into the logical decoding layer's supporting framework? Patches welcome for the next major release cycle.
Thus, that's where I think you should actually start. Extract (and where necessary generalize) key parts of your code that should be provided by postgres its self, not implemented by each plugin. And submit it so all plugins can share it and yours can be simpler. Eventually to the point where output plugins are often simple format wrappers.
You might want to look at
* pglogical's output plugin; and
* bottled-water
for ideas about things that would benefit from shared infrastructure, and ways to generalize it. I will be very happy to help there as time permits.
As a side note, doing this would also help making logical decoding a useful feature for customers of Amazon and Google's built-in Postgres hosting options.
Colour me totally unconvinced there. Either, or both, can simply bless out-of-tree plugins as it is; after all, they can and do patch the core server freely too.
It'd *help* encourage them both to pick the same plugin, but that's about it. And only if the plugin could satisfy their various constraints about no true superuser access, etc.
I guess I'm a bit frustrated, because *I tried this*, and where was anyone from Google or Amazon then? But now there's a new home-invented plugin that we should adopt, ignoring any of the existing ones. Why?
No README?
Why did this need to be invented, rather than using an existing plugin?
I don't mind, I mean, it's great that you're using the plugin infrastructure and using postgres. I'm just curious what bottled-water, pglogical, etc lacked, what made you go your own way?
On 24 September 2017 at 07:41, Euler Taveira <euler@timbira.com.br> wrote:
It is difficult to
coordinate a change like that having only one-way communication).
I really think we need to fix that at some point, such that:
* Downstream connections can send CopyData messages *up* the COPY BOTH protocol, where they're passed to a hook on the output plugin; and
* Output plugins can hook the walsender's event loop (latch set, etc) and send their own messages without being driven by a logical decoding event .
I wanted to do that some time ago but ran into some issues and time constraints. Because of the need to support older versions I'm now committed to an approach using direct libpq connections and function calls instead, but it seems like a real shame to do that when the replication protocol connection is *right there*...
On 24 September 2017 at 15:15, Craig Ringer <craig@2ndquadrant.com> wrote: > On 23 September 2017 at 06:28, Gregory Brail <gregbrail@google.com> wrote: > >> >> Would the community support the development of another plugin that is >> distributed as part of "contrib" that addresses these issues? > > > Petr Jelinek and I tried just that with pglogical. Our submission was > knocked back with the complaint that there was no in-core user of the code, > and it couldn't be evaluated usefully without an in-core consumer/receiver. > > It's possible we'd make more progress if we tried again now, since we could > probably write a test suite using the TAP test framework and a small > src/test/modules consumer. But now we'd probably instead get blocked with > the complaint that the output plugin used for logical replication should be > sufficient for any reasonable need. I anticipate that we'd have some > disagreements about what a reasonable need is, but ... *shrug*. > > I personally think we _should_ have such a thing, and that it should be > separate to the logical replication plugin to allow us to evolve that > without worrying about out of core dependencies etc. We plan to submit the next evolution of the code in 2018, in time for the early cycle of PG12. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 23/09/17 19:01, Alvaro Hernandez wro> On 23/09/17 18:42, Euler Taveira wrote: >> 2017-09-22 19:28 GMT-03:00 Gregory Brail <gregbrail@google.com>: >>> We have been working on a project that makes extensive use of logical >>> replication for use inside Apigee (which is a very small part of >>> Google): >>> >>> https://github.com/apigee-labs/transicator >>> >>> In order to do this, we had to write our own logical replication plugin >>> because the supplied "test_decoding" plugin from the "contrib" >>> directory was >>> hard for us to work with. Primarily: >>> >> test_decoding is a proof of concept to illustrate the logical decoding >> potential. It is not intended for production. > > However, AFAIK, AWS's DMS uses it for production purposes (see > http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_PostgreSQL.html). > > >> I developed wal2json [1] >> for general use. It outputs changes in JSON. It was one of the first >> logical decoding plugins. >> >>> 1) It doesn't include all the fields that we need for Transicator (most >>> importantly we need the LSN and the 64-bit transaction ID). >>> >> wal2json includes both. >> >>> 2) It outputs a text format that is hard to parse. >>> >> There are a lot of JSON parsers. >> >>> I imagine that other users of logical decoding are facing similar >>> problems. >>> >>> Would the community support the development of another plugin that is >>> distributed as part of "contrib" that addresses these issues? I'd be >>> happy >>> to submit a patch, or GitHub repo, or whatever works best as an example. >>> (Also, although Transicator uses protobuf, I'm happy to have it output a >>> simple binary format as well.) >>> >> There was a prior discussion and it was suggestted that we have a >> ready-for-production plugin in core (besides pgoutput). It was >> suggested [1] that I submit wal2json for 11. I'm in process to clean >> up the code and hope to submit it to CF2. Thanks, I'll be happy to review that. > > I would be happy to see another logical decoding plugin into core > starting on 11. However, this also poses a bit of a challenge for > middleware implementors: you need to support one for 9.4-9.5 > (test_decoding), another for 10 (pgoutput) and maybe another for 11 > onwards. The idea of asking users to install a binary plugin is very > unsexy, so these are the options available. Well, test_decoding is not meant for production use anyway, no need for middleware to support it. The pgoutput is primarily used for internal replication purposes, which is why we need something with more interoperability in mind in the first place. The new plugin should still support publications etc though IMHO. > However, having said that, and while json is a great output format > for interoperability, if there's a discussion on which plugin to include > next, I'd also favor one that has some more compact representation > format (or that supports several formats, not only json). > JSON is indeed great for interoperability, if you want more compact format, use either pgoutput or write something of your own or do conversion to something else in your consumer. I don't think postgres needs to provide 100 different formats out of the box when there is an API. The JSON output does not have to be extremely chatty either btw. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 25/09/17 19:39, Petr Jelinek wrote: > > Well, test_decoding is not meant for production use anyway, no need for > middleware to support it. The pgoutput is primarily used for internal > replication purposes, which is why we need something with more > interoperability in mind in the first place. The new plugin should still > support publications etc though IMHO. > >> However, having said that, and while json is a great output format >> for interoperability, if there's a discussion on which plugin to include >> next, I'd also favor one that has some more compact representation >> format (or that supports several formats, not only json). >> > JSON is indeed great for interoperability, if you want more compact > format, use either pgoutput or write something of your own or do > conversion to something else in your consumer. I don't think postgres > needs to provide 100 different formats out of the box when there is an > API. The JSON output does not have to be extremely chatty either btw. > In my opinion, logical decoding plugins that don't come with core are close to worthless (don't get me wrong): - They very unlikely will be installed in managed environments (an area growing significantly). - As anything that is not in core, raises concerns by users. - Distribution and testing are non-trivial: many OS/archs combinations. Given the above, I believe having a general-purpose output plugin in-core is critical to the use of logical decoding. As for 9.4-9.6 there is test_decoding, and given that AWS uses it for production, that's kind of fine. For 10 there is at least pgoutput, which could be used (even though it was meant for replication). But if a new plugin is to be developed for 11+, one really general purpose one, I'd say json is not a good choice if it is the only output it would support. json is too verbose, and replication, if anything, needs performance (it is both network heavy and serialization/deserialization is quite expensive). Why not, if one and only one plugin would be developed for 11+, general purpose, do something that is, indeed, more general, i.e., that supports high-performance scenarios too? Álvaro -- Alvaro Hernandez ----------- OnGres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 09/25/2017 12:48 PM, Alvaro Hernandez wrote: > > > On 25/09/17 19:39, Petr Jelinek wrote: >> >> Well, test_decoding is not meant for production use anyway, no need for >> middleware to support it. The pgoutput is primarily used for internal >> replication purposes, which is why we need something with more >> interoperability in mind in the first place. The new plugin should still >> support publications etc though IMHO. >> >>> However, having said that, and while json is a great output format >>> for interoperability, if there's a discussion on which plugin to >>> include >>> next, I'd also favor one that has some more compact representation >>> format (or that supports several formats, not only json). >>> >> JSON is indeed great for interoperability, if you want more compact >> format, use either pgoutput or write something of your own or do >> conversion to something else in your consumer. I don't think postgres >> needs to provide 100 different formats out of the box when there is an >> API. The JSON output does not have to be extremely chatty either btw. >> > > In my opinion, logical decoding plugins that don't come with core > are close to worthless (don't get me wrong): > > - They very unlikely will be installed in managed environments (an > area growing significantly). > - As anything that is not in core, raises concerns by users. > - Distribution and testing are non-trivial: many OS/archs combinations. > > Given the above, I believe having a general-purpose output plugin > in-core is critical to the use of logical decoding. As for 9.4-9.6 > there is test_decoding, and given that AWS uses it for production, > that's kind of fine. For 10 there is at least pgoutput, which could be > used (even though it was meant for replication). But if a new plugin > is to be developed for 11+, one really general purpose one, I'd say > json is not a good choice if it is the only output it would support. > json is too verbose, and replication, if anything, needs performance > (it is both network heavy and serialization/deserialization is quite > expensive). Why not, if one and only one plugin would be developed for > 11+, general purpose, do something that is, indeed, more general, > i.e., that supports high-performance scenarios too? > > > A general purpose lower bandwidth plugin might one supporting Protocol Buffers. The downside is that unlike json it's not self-contained, you need the message definitions to interpret the stream, AIUI. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
I'm encouraged that pgoutput exists and I'm sorry that I missed it before. I think it's fine as a binary-only format. If someone can write a client for the Postgres wire protocol as documented in Chapter 52 of the docs, then they should have no trouble consuming the output from pgoutput.
However, I can't find any docs for the output format of pgoutput, which is going to make it less likely for people to be able to consume it. Is anyone working on docs? I know that it's a painful process.
I also think that a JSON-format (or configurable format) plugin would make this part of PG much more usable and I'd encourage the community to come up with one.
Finally, since there were some "why didn't you just" questions in the email thread, let me write a little bit about what we were trying to do.
We have a set of data that represents the configuration of some of our customer's systems. (This is for Apigee Edge, which is a software product that represents a small part of Google Cloud, and which was developed long before we joined Google.) We'd like to efficiently and reliably push configuration changes down to our customer's systems, mostly to make it possible for them to run parts of our software stack in their own data centers, with limited or even unreliable network connectivity to the rest of our services. Data replication is a great fit for this problem.
However, we want the downstream software components (the ones that our customers run in their own data centers) to know when various things change, we want those changes delivered in a consistent order, and we want to be able to reliably receive them by having each consumer keep track of where they currently are in the replication scheme. Logical replication is a great fit for this because it enables us to build a list of all the changes to this management data in a consistent order. Once we have that list, it's fairly simple to persist it somewhere and let clients consume it in various ways. (In our case, via an HTTP API that supports long polling. Having all the clients consume a Kafka stream was not an option that we wanted to consider.)
The difference between what we're trying to do and most solutions that use logical replication is that we will have thousands or tens of thousands of clients pulling a list of changes that originated in a single Postgres database. That means that we need to index our own copy of the replication output so that clients can efficiently get changes only to "their" data. Furthermore, it means that we can't do things like create a unique replication slot for each client. Instead, we have a smaller number of servers that replicate from the master, and then those in turn give out lists of changes to other clients.
On Mon, Sep 25, 2017 at 9:48 AM, Alvaro Hernandez <aht@ongres.com> wrote:
On 25/09/17 19:39, Petr Jelinek wrote:
Well, test_decoding is not meant for production use anyway, no need for
middleware to support it. The pgoutput is primarily used for internal
replication purposes, which is why we need something with more
interoperability in mind in the first place. The new plugin should still
support publications etc though IMHO.However, having said that, and while json is a great output formatJSON is indeed great for interoperability, if you want more compact
for interoperability, if there's a discussion on which plugin to include
next, I'd also favor one that has some more compact representation
format (or that supports several formats, not only json).
format, use either pgoutput or write something of your own or do
conversion to something else in your consumer. I don't think postgres
needs to provide 100 different formats out of the box when there is an
API. The JSON output does not have to be extremely chatty either btw.
In my opinion, logical decoding plugins that don't come with core are close to worthless (don't get me wrong):
- They very unlikely will be installed in managed environments (an area growing significantly).
- As anything that is not in core, raises concerns by users.
- Distribution and testing are non-trivial: many OS/archs combinations.
Given the above, I believe having a general-purpose output plugin in-core is critical to the use of logical decoding. As for 9.4-9.6 there is test_decoding, and given that AWS uses it for production, that's kind of fine. For 10 there is at least pgoutput, which could be used (even though it was meant for replication). But if a new plugin is to be developed for 11+, one really general purpose one, I'd say json is not a good choice if it is the only output it would support. json is too verbose, and replication, if anything, needs performance (it is both network heavy and serialization/deserialization is quite expensive). Why not, if one and only one plugin would be developed for 11+, general purpose, do something that is, indeed, more general, i.e., that supports high-performance scenarios too?
Álvaro
--
Alvaro Hernandez
-----------
OnGres
On 09/25/2017 09:59 AM, Gregory Brail wrote: > However, I can't find any docs for the output format of pgoutput, which > is going to make it less likely for people to be able to consume it. Is > anyone working on docs? I know that it's a painful process. > > I also think that a JSON-format (or configurable format) plugin would > make this part of PG much more usable and I'd encourage the community to > come up with one. https://github.com/ildus/decoder_json https://github.com/posix4e/jsoncdc https://github.com/leptonix/decoding-json https://github.com/Aloomaio/psql-json-decoder Thanks, JD -- Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc PostgreSQL Centered full stack support, consulting and development. Advocate: @amplifypostgres || Learn: https://pgconf.us ***** Unless otherwise stated, opinions are my own. ***** -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 25/09/17 19:56, Andrew Dunstan wrote: > > On 09/25/2017 12:48 PM, Alvaro Hernandez wrote: >> >> On 25/09/17 19:39, Petr Jelinek wrote: >>> Well, test_decoding is not meant for production use anyway, no need for >>> middleware to support it. The pgoutput is primarily used for internal >>> replication purposes, which is why we need something with more >>> interoperability in mind in the first place. The new plugin should still >>> support publications etc though IMHO. >>> >>>> However, having said that, and while json is a great output format >>>> for interoperability, if there's a discussion on which plugin to >>>> include >>>> next, I'd also favor one that has some more compact representation >>>> format (or that supports several formats, not only json). >>>> >>> JSON is indeed great for interoperability, if you want more compact >>> format, use either pgoutput or write something of your own or do >>> conversion to something else in your consumer. I don't think postgres >>> needs to provide 100 different formats out of the box when there is an >>> API. The JSON output does not have to be extremely chatty either btw. >>> >> In my opinion, logical decoding plugins that don't come with core >> are close to worthless (don't get me wrong): >> >> - They very unlikely will be installed in managed environments (an >> area growing significantly). >> - As anything that is not in core, raises concerns by users. >> - Distribution and testing are non-trivial: many OS/archs combinations. >> >> Given the above, I believe having a general-purpose output plugin >> in-core is critical to the use of logical decoding. As for 9.4-9.6 >> there is test_decoding, and given that AWS uses it for production, >> that's kind of fine. For 10 there is at least pgoutput, which could be >> used (even though it was meant for replication). But if a new plugin >> is to be developed for 11+, one really general purpose one, I'd say >> json is not a good choice if it is the only output it would support. >> json is too verbose, and replication, if anything, needs performance >> (it is both network heavy and serialization/deserialization is quite >> expensive). Why not, if one and only one plugin would be developed for >> 11+, general purpose, do something that is, indeed, more general, >> i.e., that supports high-performance scenarios too? >> >> >> > > A general purpose lower bandwidth plugin might one supporting Protocol > Buffers. The downside is that unlike json it's not self-contained, you > need the message definitions to interpret the stream, AIUI. Sure. But that's just a matter of documenting them, or even better, providing the .proto files, which are language-independent. There are also many other efficient serialization formats to explore, some self-contained, some not. Álvaro -- Alvaro Hernandez ----------- OnGres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Yes. I'm advocating something "built-in" to Postgres. Any or all of those are likely a great starting point.
As for protobuf, I'm a big advocate -- it is easy to use, fast, extensible, runs on lots of platforms, and produces very compact output. However it introduces a few dependencies to the build and that may make it too difficult for wide options within Postgres.
On Mon, Sep 25, 2017 at 10:07 AM, Joshua D. Drake <jd@commandprompt.com> wrote:
On 09/25/2017 09:59 AM, Gregory Brail wrote:However, I can't find any docs for the output format of pgoutput, which is going to make it less likely for people to be able to consume it. Is anyone working on docs? I know that it's a painful process.
I also think that a JSON-format (or configurable format) plugin would make this part of PG much more usable and I'd encourage the community to come up with one.
https://github.com/ildus/decoder_json
https://github.com/posix4e/jsoncdc
https://github.com/leptonix/decoding-json
https://github.com/Aloomaio/psql-json-decoder
Thanks,
JD
--
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc
PostgreSQL Centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://pgconf.us
***** Unless otherwise stated, opinions are my own. *****
On 2017-09-24 13:36:56 +0300, Alvaro Hernandez wrote: > However, if DMS uses it for what I'd call production use, I assume it is > actually production quality. I bet they do enough testing, and don't ship > software to potentially millions of customers if it doesn't work well. So... > first, I'd consider this a a sign of robustness. You've been in software for how long? ... ;) There's quite mixed experiences with DMS. FWIW, I don't think there's a huge problem w/ using test_decoding - the output isn't pretty but it's parseable. It's too verbose due to repeating column & type names (which also slows down), but... Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Hi, On 2017-09-25 12:56:00 -0400, Andrew Dunstan wrote: > A general purpose lower bandwidth plugin might one supporting Protocol > Buffers. The downside is that unlike json it's not self-contained, you > need the message definitions to interpret the stream, AIUI. I think that makes it a non-starter for many purposes were you care about bandwidth. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 25/09/17 18:48, Alvaro Hernandez wrote: > > > On 25/09/17 19:39, Petr Jelinek wrote: >> >> Well, test_decoding is not meant for production use anyway, no need for >> middleware to support it. The pgoutput is primarily used for internal >> replication purposes, which is why we need something with more >> interoperability in mind in the first place. The new plugin should still >> support publications etc though IMHO. >> >>> However, having said that, and while json is a great output format >>> for interoperability, if there's a discussion on which plugin to include >>> next, I'd also favor one that has some more compact representation >>> format (or that supports several formats, not only json). >>> >> JSON is indeed great for interoperability, if you want more compact >> format, use either pgoutput or write something of your own or do >> conversion to something else in your consumer. I don't think postgres >> needs to provide 100 different formats out of the box when there is an >> API. The JSON output does not have to be extremely chatty either btw. >> > > In my opinion, logical decoding plugins that don't come with core > are close to worthless (don't get me wrong): > I respectfully disagree. > - They very unlikely will be installed in managed environments (an area > growing significantly). > - As anything that is not in core, raises concerns by users. > - Distribution and testing are non-trivial: many OS/archs combinations. > > Given the above, I believe having a general-purpose output plugin > in-core is critical to the use of logical decoding. As for 9.4-9.6 there > is test_decoding, and given that AWS uses it for production, that's kind > of fine. For 10 there is at least pgoutput, which could be used (even > though it was meant for replication). But if a new plugin is to be > developed for 11+, one really general purpose one, I'd say json is not a > good choice if it is the only output it would support. json is too > verbose, and replication, if anything, needs performance (it is both > network heavy and serialization/deserialization is quite expensive). Why > not, if one and only one plugin would be developed for 11+, general > purpose, do something that is, indeed, more general, i.e., that supports > high-performance scenarios too? > IMO for general purpose use the adoption/ease of parsing is more important criteria which is why JSON is good option. Again if you need something more tuned to performance and you are fine with some hardship in terms of parsing, what's stopping you from using pgoutput? -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 09/25/2017 10:15 AM, Gregory Brail wrote: > Yes. I'm advocating something "built-in" to Postgres. Any or all of > those are likely a great starting point. I don't see a benefit to having this "in postgres". The whole reason we have built out a mature and extensible product is so that not everything needs to be in postgres. We instead have a huge landscape of extensions created by our community that we don't have to manage in core. Thanks, JD -- Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc PostgreSQL Centered full stack support, consulting and development. Advocate: @amplifypostgres || Learn: https://pgconf.us ***** Unless otherwise stated, opinions are my own. ***** -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Alvaro Hernandez <aht@ongres.com> writes: > In my opinion, logical decoding plugins that don't come with core > are close to worthless (don't get me wrong): > - They very unlikely will be installed in managed environments (an area > growing significantly). > - As anything that is not in core, raises concerns by users. > - Distribution and testing are non-trivial: many OS/archs combinations. The problem with this type of argument is that it leads directly to the conclusion that every feature users want must be in core. We can't accept that conclusion, because we simply do not have the manpower or infrastructure to maintain a kitchen-sink, Oracle-sized code base. I think we're already more or less at the limit of the feature set we can deal with :-(. The entire point of the output-plugin design was to allow useful replication stuff to be done outside of core; we need to make use of that. (If that means we need better docs, then yeah, we'd better get that part done.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 09/25/2017 10:19 AM, Petr Jelinek wrote: > On 25/09/17 18:48, Alvaro Hernandez wrote: >> >> In my opinion, logical decoding plugins that don't come with core >> are close to worthless (don't get me wrong): >> > > I respectfully disagree. As do I. > >> - They very unlikely will be installed in managed environments (an area >> growing significantly). Whether or not they are included in a managed environment is generally based on two things: 1. Safety (why RDS doesn't allow certain C extensions)2. Community/Popularity (Exactly why RDS has PostGIS) A. Demandwith a prerequisite of #1 >> - As anything that is not in core, raises concerns by users. I find this a rather failing argument in today's market. If they are willing to migrate to Postgres, they are more than likely willing to use other open source software. Especially when combined with an expert telling them to. >> - Distribution and testing are non-trivial: many OS/archs combinations. >> Yes, it is. Why would we want to increase that burden to this community? Thanks, JD -- Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc PostgreSQL Centered full stack support, consulting and development. Advocate: @amplifypostgres || Learn: https://pgconf.us ***** Unless otherwise stated, opinions are my own. ***** -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 25/09/17 19:26, Tom Lane wrote: > Alvaro Hernandez <aht@ongres.com> writes: >> In my opinion, logical decoding plugins that don't come with core >> are close to worthless (don't get me wrong): > >> - They very unlikely will be installed in managed environments (an area >> growing significantly). >> - As anything that is not in core, raises concerns by users. >> - Distribution and testing are non-trivial: many OS/archs combinations. > > The problem with this type of argument is that it leads directly to the > conclusion that every feature users want must be in core. We can't > accept that conclusion, because we simply do not have the manpower or > infrastructure to maintain a kitchen-sink, Oracle-sized code base. > I think we're already more or less at the limit of the feature set we can > deal with :-(. The entire point of the output-plugin design was to allow > useful replication stuff to be done outside of core; we need to make use > of that. (If that means we need better docs, then yeah, we'd better get > that part done.) > There is already about 3 million output plugins out there so I think we did reasonable job there. The fact that vast majority of that are various json ones gives reasonable hint that we should have that one in core though. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 09/25/2017 10:32 AM, Petr Jelinek wrote: > On 25/09/17 19:26, Tom Lane wrote: >> Alvaro Hernandez <aht@ongres.com> writes: > > There is already about 3 million output plugins out there so I think we > did reasonable job there. The fact that vast majority of that are > various json ones gives reasonable hint that we should have that one in > core though. And I am sure that 2ndQuadrant would be happy to add it to their version of Postgres and maintain it themselves. https://www.2ndquadrant.com/en/resources/2ndqpostgres/ Thanks, JD -- Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc PostgreSQL Centered full stack support, consulting and development. Advocate: @amplifypostgres || Learn: https://pgconf.us ***** Unless otherwise stated, opinions are my own. ***** -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 2017-09-25 10:38:52 -0700, Joshua D. Drake wrote: > On 09/25/2017 10:32 AM, Petr Jelinek wrote: > > On 25/09/17 19:26, Tom Lane wrote: > > > Alvaro Hernandez <aht@ongres.com> writes: > > > > > There is already about 3 million output plugins out there so I think we > > did reasonable job there. The fact that vast majority of that are > > various json ones gives reasonable hint that we should have that one in > > core though. > > And I am sure that 2ndQuadrant would be happy to add it to their version of > Postgres and maintain it themselves. > > https://www.2ndquadrant.com/en/resources/2ndqpostgres/ This doesn't seem like a good way to argue. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 2017-09-25 19:32:29 +0200, Petr Jelinek wrote: > On 25/09/17 19:26, Tom Lane wrote: > > Alvaro Hernandez <aht@ongres.com> writes: > >> In my opinion, logical decoding plugins that don't come with core > >> are close to worthless (don't get me wrong): > > > >> - They very unlikely will be installed in managed environments (an area > >> growing significantly). > >> - As anything that is not in core, raises concerns by users. > >> - Distribution and testing are non-trivial: many OS/archs combinations. > > > > The problem with this type of argument is that it leads directly to the > > conclusion that every feature users want must be in core. We can't > > accept that conclusion, because we simply do not have the manpower or > > infrastructure to maintain a kitchen-sink, Oracle-sized code base. > > I think we're already more or less at the limit of the feature set we can > > deal with :-(. The entire point of the output-plugin design was to allow > > useful replication stuff to be done outside of core; we need to make use > > of that. (If that means we need better docs, then yeah, we'd better get > > that part done.) I obvoiusly agree that not every possible output plugin should be put into core. A number of them have significant external dependencies and/or are specialized for a certain environment. However I don't think that should mean that there's no possible output plugin that could/should be integrated into core. And I think Petr's right here: > There is already about 3 million output plugins out there so I think we > did reasonable job there. The fact that vast majority of that are > various json ones gives reasonable hint that we should have that one in > core though. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 09/25/2017 10:43 AM, Andres Freund wrote: > On 2017-09-25 10:38:52 -0700, Joshua D. Drake wrote: >> On 09/25/2017 10:32 AM, Petr Jelinek wrote: >>> On 25/09/17 19:26, Tom Lane wrote: >>>> Alvaro Hernandez <aht@ongres.com> writes: >> >>> >>> There is already about 3 million output plugins out there so I think we >>> did reasonable job there. The fact that vast majority of that are >>> various json ones gives reasonable hint that we should have that one in >>> core though. >> >> And I am sure that 2ndQuadrant would be happy to add it to their version of >> Postgres and maintain it themselves. >> >> https://www.2ndquadrant.com/en/resources/2ndqpostgres/ > > This doesn't seem like a good way to argue. > Sorry, that wasn't supposed to be negative. My point was that 2ndQuadrant has a distribution of Postgres that they support. If 2ndQuadrant wants the feature, they could add it to their own without burdening the wider community further. It provides 2ndQuadrant what they are arguing for, benefits 2ndQuadrant as it increases the visibility and opportunity of their distribution for wider use. This is essentially what BigSQL and EDB are doing quite successfully. They add what the Core .Org community won't or doesn't (for whatever reason) and that makes their distribution attractive for their users. Thanks, JD -- Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc PostgreSQL Centered full stack support, consulting and development. Advocate: @amplifypostgres || Learn: https://pgconf.us ***** Unless otherwise stated, opinions are my own. ***** -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund <andres@anarazel.de> writes: >> On 25/09/17 19:26, Tom Lane wrote: >>> The problem with this type of argument is that it leads directly to the >>> conclusion that every feature users want must be in core. > ... I don't think that should mean that there's no possible output > plugin that could/should be integrated into core. Yeah, my point is just that the argument needs to be about why a *particular* plugin is valuable enough to justify adding it to the core developers' maintenance workload. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 2017-09-25 13:50:29 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > >> On 25/09/17 19:26, Tom Lane wrote: > >>> The problem with this type of argument is that it leads directly to the > >>> conclusion that every feature users want must be in core. > > > ... I don't think that should mean that there's no possible output > > plugin that could/should be integrated into core. > > Yeah, my point is just that the argument needs to be about why a > *particular* plugin is valuable enough to justify adding it to the > core developers' maintenance workload. +1 Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Andres, all, * Andres Freund (andres@anarazel.de) wrote: > On 2017-09-25 19:32:29 +0200, Petr Jelinek wrote: > > On 25/09/17 19:26, Tom Lane wrote: > > > Alvaro Hernandez <aht@ongres.com> writes: > > >> In my opinion, logical decoding plugins that don't come with core > > >> are close to worthless (don't get me wrong): > > > > > >> - They very unlikely will be installed in managed environments (an area > > >> growing significantly). > > >> - As anything that is not in core, raises concerns by users. > > >> - Distribution and testing are non-trivial: many OS/archs combinations. > > > > > > The problem with this type of argument is that it leads directly to the > > > conclusion that every feature users want must be in core. We can't > > > accept that conclusion, because we simply do not have the manpower or > > > infrastructure to maintain a kitchen-sink, Oracle-sized code base. > > > I think we're already more or less at the limit of the feature set we can > > > deal with :-(. The entire point of the output-plugin design was to allow > > > useful replication stuff to be done outside of core; we need to make use > > > of that. (If that means we need better docs, then yeah, we'd better get > > > that part done.) > > I obvoiusly agree that not every possible output plugin should be put > into core. A number of them have significant external dependencies > and/or are specialized for a certain environment. > > However I don't think that should mean that there's no possible output > plugin that could/should be integrated into core. And I think Petr's > right here: > > > There is already about 3 million output plugins out there so I think we > > did reasonable job there. The fact that vast majority of that are > > various json ones gives reasonable hint that we should have that one in > > core though. I tend to agree with Andres & Petr here as well. No, we certainly don't want to add features that aren't going to be used much to core or which stretch the resources we have beyond what we're able to handle, but having a good, solid, output plugin that works well and can be built upon should be included into core. PG is often deployed in complex ecosystems where we need to work with other systems and this is an important part of that. Also, to some extent, I'm hopeful that being both open to new features, when they make sense, and providing more ways for other systems to work with PG, will lead to more contributions and hopefully regular contributors who can help us maintain the code base as it continues to grow. Thanks! Stephen
On 25/09/17 20:18, Andres Freund wrote: > On 2017-09-24 13:36:56 +0300, Alvaro Hernandez wrote: >> However, if DMS uses it for what I'd call production use, I assume it is >> actually production quality. I bet they do enough testing, and don't ship >> software to potentially millions of customers if it doesn't work well. So... >> first, I'd consider this a a sign of robustness. > You've been in software for how long? ... ;) There's quite mixed > experiences with DMS. Actually long enough to understand that if someone "big" calls it production quality, we should not be pickier and assume it is --whether it is or not. People will accept it as such, and that's good enough. ;) > > FWIW, I don't think there's a huge problem w/ using test_decoding - the > output isn't pretty but it's parseable. It's too verbose due to > repeating column & type names (which also slows down), but... Everything is parseable. I don't have a big problem with that. Stability is another issue: as long as it supports high volume operations and doesn't break, it's acceptable enough. Álvaro -- Alvaro Hernandez ----------- OnGres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 25/09/17 20:31, Joshua D. Drake wrote: > On 09/25/2017 10:19 AM, Petr Jelinek wrote: >> On 25/09/17 18:48, Alvaro Hernandez wrote: >>> > >>> In my opinion, logical decoding plugins that don't come with core >>> are close to worthless (don't get me wrong): >>> >> >> I respectfully disagree. > > As do I. But Petr, without saying why, it does not help much to the discussion ;) > >> >>> - They very unlikely will be installed in managed environments (an area >>> growing significantly). > > Whether or not they are included in a managed environment is generally > based on two things: > > 1. Safety (why RDS doesn't allow certain C extensions) > 2. Community/Popularity (Exactly why RDS has PostGIS) > A. Demand with a prerequisite of #1 This is very clear. Now tell me: how many output plugins do you see included in RDS. And in GCP's PostgreSQL? Azure Postgres? Heroku? I'm looking at this from the practical perspective: if you would want to write a middleware for PostgreSQL that would rely on logical decoding, you definitely want to run on this platforms, or you are out of the game. If we want PostgreSQL to integrate more easily in nowadays very heterogeneous environments, this is key. And relying on non-included-or-acceptable-in-many environments output plugins is not, IMHO, a viable nor sensible option. I'd rather stick to test_decoding or pgoutput, no question. > >>> - As anything that is not in core, raises concerns by users. > > I find this a rather failing argument in today's market. If they are > willing to migrate to Postgres, they are more than likely willing to > use other open source software. Especially when combined with an > expert telling them to. > > >>> - Distribution and testing are non-trivial: many OS/archs combinations. >>> > > Yes, it is. Why would we want to increase that burden to this community? That's a different story, and one I cannot argue against. If easying postgresql integration with other tools is not something of a priority or something the core group cannot add to all the stuff on their shoulders, all my due respect. PostgreSQL users will do without, someway, somehow. But IMHO this should be a really high priority, and saying that this would turn PostgreSQL into an Oracle code base is going too far ;) Álvaro -- Alvaro Hernandez ----------- OnGres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 09/25/2017 11:31 AM, Alvaro Hernandez wrote: > >> Whether or not they are included in a managed environment is generally >> based on two things: >> >> 1. Safety (why RDS doesn't allow certain C extensions) >> 2. Community/Popularity (Exactly why RDS has PostGIS) >> A. Demand with a prerequisite of #1 > > This is very clear. Now tell me: how many output plugins do you see > included in RDS. And in GCP's PostgreSQL? Azure Postgres? Heroku? From RDS: Logical Replication for PostgreSQL on Amazon RDS Beginning with PostgreSQL version 9.4, PostgreSQL supports the streaming of WAL changes using logical replication slots. Amazon RDS supports logical replication for a PostgreSQL DB instance version 9.4.9 and higher and 9.5.4 and higher. Using logical replication, you can set up logical replication slots on your instance and stream database changes through these slots to a client like pg_recvlogical. Logical slots are created at the database level and support replication connections to a single database. PostgreSQL logical replication on Amazon RDS is enabled by a new parameter, a new replication connection type, and a new security role. The client for the replication can be any client that is capable of establishing a replication connection to a database on a PostgreSQL DB instance. The most common clients for PostgreSQL logical replication are AWS Database Migration Service or a custom-managed host on an AWS EC2 instance. The logical replication slot knows nothing about the receiver of the stream; there is no requirement that the target be a replica database. Note that if you set up a logical replication slot and do not read from the slot, data can be written to your DB instance's storage and you can quickly fill up the storage on your instance. """ I don't see why others wouldn't be available either. In fact, I am not sure why you couldn't use the JSON ones now. (Although I have not tested it). JD -- Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc PostgreSQL Centered full stack support, consulting and development. Advocate: @amplifypostgres || Learn: https://pgconf.us ***** Unless otherwise stated, opinions are my own. ***** -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 2017-09-25 21:31:11 +0300, Alvaro Hernandez wrote: > > > > - Distribution and testing are non-trivial: many OS/archs combinations. > > > > > > > > Yes, it is. Why would we want to increase that burden to this community? > > > That's a different story, and one I cannot argue against. If easying > postgresql integration with other tools is not something of a priority or > something the core group cannot add to all the stuff on their shoulders, all > my due respect. PostgreSQL users will do without, someway, somehow. But IMHO > this should be a really high priority, and saying that this would turn > PostgreSQL into an Oracle code base is going too far ;) Well, we can certainly use more help doing maintenance-y stuff, which will in turn allow to get more stuff into core in a good state medium+ term... ;) Less jokingly: I'm doubtful that the "not a priority" take is really fair - there's a lot of priorities, and they compete for scant resources. Which means people have to argue convincingly if they want to add to that burden - just actually asking the question whether it's a good use of resources doesn't mean it's not. Just that it's not yet clear. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Sep 25, 2017 at 11:37 AM, Joshua D. Drake <jd@commandprompt.com> wrote:
On 09/25/2017 11:31 AM, Alvaro Hernandez wrote:Whether or not they are included in a managed environment is generally based on two things:
1. Safety (why RDS doesn't allow certain C extensions)
2. Community/Popularity (Exactly why RDS has PostGIS)
A. Demand with a prerequisite of #1
This is very clear. Now tell me: how many output plugins do you see included in RDS. And in GCP's PostgreSQL? Azure Postgres? Heroku?
From RDS:
Logical Replication for PostgreSQL on Amazon RDS
Beginning with PostgreSQL version 9.4, PostgreSQL supports the streaming of WAL changes using logical replication slots. Amazon RDS supports logical replication for a PostgreSQL DB instance version 9.4.9 and higher and 9.5.4 and higher. Using logical replication, you can set up logical replication slots on your instance and stream database changes through these slots to a client like pg_recvlogical. Logical slots are created at the database level and support replication connections to a single database.
PostgreSQL logical replication on Amazon RDS is enabled by a new parameter, a new replication connection type, and a new security role. The client for the replication can be any client that is capable of establishing a replication connection to a database on a PostgreSQL DB instance.
The most common clients for PostgreSQL logical replication are AWS Database Migration Service or a custom-managed host on an AWS EC2 instance. The logical replication slot knows nothing about the receiver of the stream; there is no requirement that the target be a replica database. Note that if you set up a logical replication slot and do not read from the slot, data can be written to your DB instance's storage and you can quickly fill up the storage on your instance.
"""
I don't see why others wouldn't be available either. In fact, I am not sure why you couldn't use the JSON ones now. (Although I have not tested it).
JD
Also to add, Amazon RDS for PostgreSQL does supports non-core plugins. Wal2json output plugin for logical decoding is supported for versions 9.6.3+ and 9.5.7+ (link) .
Regards,
Jignesh
On Mon, Sep 25, 2017 at 8:20 PM, Alvaro Hernandez <aht@ongres.com> wrote:
On 25/09/17 20:18, Andres Freund wrote:On 2017-09-24 13:36:56 +0300, Alvaro Hernandez wrote:However, if DMS uses it for what I'd call production use, I assume it isYou've been in software for how long? ... ;) There's quite mixed
actually production quality. I bet they do enough testing, and don't ship
software to potentially millions of customers if it doesn't work well. So...
first, I'd consider this a a sign of robustness.
experiences with DMS.
Actually long enough to understand that if someone "big" calls it production quality, we should not be pickier and assume it is --whether it is or not. People will accept it as such, and that's good enough.
Historically the fact that we have been pickier than many of the "someone big":s is exactly how we ended up with the codebase and relative stability we have today.
Just because someone is big doesn't mean they know what's right. In fact, more often than not the opposite turns out to be true.
On Sep 25, 2017 1:39 PM, "Joshua D. Drake" <jd@commandprompt.com> wrote:
On 09/25/2017 10:32 AM, Petr Jelinek wrote:On 25/09/17 19:26, Tom Lane wrote:Alvaro Hernandez <aht@ongres.com> writes:
There is already about 3 million output plugins out there so I think we
did reasonable job there. The fact that vast majority of that are
various json ones gives reasonable hint that we should have that one in
core though.
And I am sure that 2ndQuadrant would be happy to add it to their version of Postgres and maintain it themselves.
That's not the case to be thinking about... Of course, 2Q can support some extensions for their customers that use their binaries, that is not a contraindication for having more in core.
I'd rather think based on questions like... "Will that be supportable in an ordinary Amazon instance?" Or "... In a Heroku instance?"
Those (Amazon and Heroku) are places that Slony won't run because it needs a bit too much in the way of database superuser capabilities.
It's a very useful exercise to know which bits of this are easy to resolve versus difficult versus irreconcilable.
Of course, we can't force everything into core, but given what IS in core, it can look mighty dumb if...
a) we have neat and crucial features in core that are (validly!) trumpeted in release notes, but
b) those features aren't usable without substantial out-of-core extensions that many users cannot use.
Perhaps it's valid for logical replication to be considered out-of-scope for generic Amazon/Heroku instances, but I'd prefer that to be the result of a reasoned intent.
On 25/09/17 19:48, Joshua D. Drake wrote: > On 09/25/2017 10:43 AM, Andres Freund wrote: >> On 2017-09-25 10:38:52 -0700, Joshua D. Drake wrote: >>> On 09/25/2017 10:32 AM, Petr Jelinek wrote: >>>> On 25/09/17 19:26, Tom Lane wrote: >>>>> Alvaro Hernandez <aht@ongres.com> writes: >>> >>>> >>>> There is already about 3 million output plugins out there so I think we >>>> did reasonable job there. The fact that vast majority of that are >>>> various json ones gives reasonable hint that we should have that one in >>>> core though. >>> >>> And I am sure that 2ndQuadrant would be happy to add it to their >>> version of >>> Postgres and maintain it themselves. >>> >>> https://www.2ndquadrant.com/en/resources/2ndqpostgres/ >> >> This doesn't seem like a good way to argue. >> > > Sorry, that wasn't supposed to be negative. My point was that > 2ndQuadrant has a distribution of Postgres that they support. If > 2ndQuadrant wants the feature, they could add it to their own without > burdening the wider community further. It provides 2ndQuadrant what they > are arguing for, benefits 2ndQuadrant as it increases the visibility and > opportunity of their distribution for wider use. > I am not sure I follow, we do have pglogical extension which works just fine for this, thanks. You might have noticed that it's not what I was suggesting we use for core. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 26 September 2017 at 01:53, Andres Freund <andres@anarazel.de> wrote:
On 2017-09-25 13:50:29 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> >> On 25/09/17 19:26, Tom Lane wrote:
> >>> The problem with this type of argument is that it leads directly to the
> >>> conclusion that every feature users want must be in core.
>
> > ... I don't think that should mean that there's no possible output
> > plugin that could/should be integrated into core.
>
> Yeah, my point is just that the argument needs to be about why a
> *particular* plugin is valuable enough to justify adding it to the
> core developers' maintenance workload.
+1
Yep, and that goes for plugins like pglogical too.
I agree we need a json plugin, it's pretty clear that's a widespread need.
But I don't buy the whole argument about "hosted postgres" issues. The hosted solutions suppliers can simply use 3rd party plugins, like some do PostGIS already. Trying to push things into core is offloading work onto us to make their lives easier and I don't much care for that.
Personally I'd be more friendly toward Amazon / Google / etc wanting us to include things for their convenience if they actually usefully contributed to development and maintenance of Pg.
Craig Ringer
On 26 September 2017 at 01:48, Joshua D. Drake <jd@commandprompt.com> wrote:
On 09/25/2017 10:43 AM, Andres Freund wrote:On 2017-09-25 10:38:52 -0700, Joshua D. Drake wrote:On 09/25/2017 10:32 AM, Petr Jelinek wrote:On 25/09/17 19:26, Tom Lane wrote:Alvaro Hernandez <aht@ongres.com> writes:
There is already about 3 million output plugins out there so I think we
did reasonable job there. The fact that vast majority of that are
various json ones gives reasonable hint that we should have that one in
core though.
And I am sure that 2ndQuadrant would be happy to add it to their version of
Postgres and maintain it themselves.
https://www.2ndquadrant.com/en/resources/2ndqpostgres/
This doesn't seem like a good way to argue.
Sorry, that wasn't supposed to be negative. My point was that 2ndQuadrant has a distribution of Postgres that they support.
For what it's worth, it's mostly things that didn't make it into core for a release, things that got knocked back, backports, etc.
I for one don't want to carry a big delta from core.
On 25/09/17 22:13, Magnus Hagander wrote:
On Mon, Sep 25, 2017 at 8:20 PM, Alvaro Hernandez <aht@ongres.com> wrote:
On 25/09/17 20:18, Andres Freund wrote:On 2017-09-24 13:36:56 +0300, Alvaro Hernandez wrote:However, if DMS uses it for what I'd call production use, I assume it isYou've been in software for how long? ... ;) There's quite mixed
actually production quality. I bet they do enough testing, and don't ship
software to potentially millions of customers if it doesn't work well. So...
first, I'd consider this a a sign of robustness.
experiences with DMS.
Actually long enough to understand that if someone "big" calls it production quality, we should not be pickier and assume it is --whether it is or not. People will accept it as such, and that's good enough.Historically the fact that we have been pickier than many of the "someone big":s is exactly how we ended up with the codebase and relative stability we have today.Just because someone is big doesn't mean they know what's right. In fact, more often than not the opposite turns out to be true.
Note that I'm not here supporting test_decoding. I'm just saying is all what is available in-core for 9.4-9.6, and it seems someone with potentially a lot of users tested it and is using it in its own solution. Ask me if I would like an in-core, well tested, performant, with an easy to parse format, and efficient, for 9.4-9.6? My answer would be an immediate 'yes'. But since this is not going to happen, test_decoding is good that is good enough, lucky us, because otherwise there would not be a good solution on this front.
Álvaro
-- Alvaro Hernandez ----------- OnGres
On 25/09/17 22:08, Jignesh Shah wrote:
On Mon, Sep 25, 2017 at 11:37 AM, Joshua D. Drake <jd@commandprompt.com> wrote:On 09/25/2017 11:31 AM, Alvaro Hernandez wrote:Whether or not they are included in a managed environment is generally based on two things:
1. Safety (why RDS doesn't allow certain C extensions)
2. Community/Popularity (Exactly why RDS has PostGIS)
A. Demand with a prerequisite of #1
This is very clear. Now tell me: how many output plugins do you see included in RDS. And in GCP's PostgreSQL? Azure Postgres? Heroku?
From RDS:
Logical Replication for PostgreSQL on Amazon RDS
Beginning with PostgreSQL version 9.4, PostgreSQL supports the streaming of WAL changes using logical replication slots. Amazon RDS supports logical replication for a PostgreSQL DB instance version 9.4.9 and higher and 9.5.4 and higher. Using logical replication, you can set up logical replication slots on your instance and stream database changes through these slots to a client like pg_recvlogical. Logical slots are created at the database level and support replication connections to a single database.
PostgreSQL logical replication on Amazon RDS is enabled by a new parameter, a new replication connection type, and a new security role. The client for the replication can be any client that is capable of establishing a replication connection to a database on a PostgreSQL DB instance.
The most common clients for PostgreSQL logical replication are AWS Database Migration Service or a custom-managed host on an AWS EC2 instance. The logical replication slot knows nothing about the receiver of the stream; there is no requirement that the target be a replica database. Note that if you set up a logical replication slot and do not read from the slot, data can be written to your DB instance's storage and you can quickly fill up the storage on your instance.
"""
I don't see why others wouldn't be available either. In fact, I am not sure why you couldn't use the JSON ones now. (Although I have not tested it).
JDAlso to add, Amazon RDS for PostgreSQL does supports non-core plugins. Wal2json output plugin for logical decoding is supported for versions 9.6.3+ and 9.5.7+ (link) .
I think that's awesome. Now... back to my original question: what is the *list* of output plugins supported by managed PostgreSQL solutions? So far it looks like wal2json for 9.5-9.6 on RDS, and nothing else (it may just be not complete, but in the best case this list won't be unfortunately long...).
Álvaro
-- Alvaro Hernandez ----------- OnGres
On 25/09/17 21:38, Andres Freund wrote: > On 2017-09-25 21:31:11 +0300, Alvaro Hernandez wrote: >>>>> - Distribution and testing are non-trivial: many OS/archs combinations. >>>>> >>> Yes, it is. Why would we want to increase that burden to this community? >> >> That's a different story, and one I cannot argue against. If easying >> postgresql integration with other tools is not something of a priority or >> something the core group cannot add to all the stuff on their shoulders, all >> my due respect. PostgreSQL users will do without, someway, somehow. But IMHO >> this should be a really high priority, and saying that this would turn >> PostgreSQL into an Oracle code base is going too far ;) > Well, we can certainly use more help doing maintenance-y stuff, which > will in turn allow to get more stuff into core in a good state medium+ > term... ;) > > Less jokingly: I'm doubtful that the "not a priority" take is really > fair - there's a lot of priorities, and they compete for scant > resources. Which means people have to argue convincingly if they want to > add to that burden - just actually asking the question whether it's a > good use of resources doesn't mean it's not. Just that it's not yet > clear. > > OK, let me try to do that. I believe data integration is a priority. World is no longer an isolated database where many apps talk to it. Nowdays heterogeneous architectures are almost the norm. CDC is often the best solution for many of the data integration tasks like data warehousing, data consolidation, stream processing, etc. From this perspective, it would be key to have a general tool or good starting point for CDC or even higher level functionality tools. Think of PostgreSQL's Golden Gate. Now, developing any software like this faces two significant challenges, which turn into deterrents for developing such software: - If you want to develop your own output plugin, then your market is reduced as you have to exclude all the managed solutions (until, and only if, you would convince them to include your plugin... highly unlikely, very difficult). And probably another % of enterprise environments which will hesitate to link your own plugin to the running production database. Last but not least, you need to compile and test (and testing this is far from easy) on multiple OSes/architectures. - If you stick to in-core plugins, then you need to support at least three different output formats if you want to support 9.4+: test_decoding (and pray it works!), pgoutput, and the "new" in-core plugin that was proposed at the beginning of this thread, if that would see the light. Both are strong enough arguments to make building this kind software far less interesting. Actually, I don't know of any software like this, and this may already be a consequence of what I'm saying. And I truly believe PostgreSQL should offer this, as part of its ecosystem (not necessarily in core). On the other hand, in-core may help encourage building this solutions. If there would be an in-core, uniform, flexible, output plugin, and ideally backported to 9.4 (I know, I know....), included in very PostgreSQL... it would really open the doors to many integration applications. Output plugins are great for your own use, for controlled environments, for specific set of users. But only in-core plugins help for general-purpose solutions. Álvaro -- Alvaro Hernandez ----------- OnGres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 26 September 2017 at 14:08, Alvaro Hernandez <aht@ongres.com> wrote:
OK, let me try to do that. I believe data integration is a priority.
Definitely agree so far.
- If you want to develop your own output plugin, then your market is reduced as you have to exclude all the managed solutions (until, and only if, you would convince them to include your plugin... highly unlikely, very difficult). And probably another % of enterprise environments which will hesitate to link your own plugin to the running production database. Last but not least, you need to compile and test (and testing this is far from easy) on multiple OSes/architectures.
Right. You probably don't want one output plugin per application.
This doesn't mean it has to be in-core, just flexible and share-able by many, and adopted by cloud vendors. Like say PostGIS.
- If you stick to in-core plugins, then you need to support at least three different output formats if you want to support 9.4+: test_decoding (and pray it works!), pgoutput, and the "new" in-core plugin that was proposed at the beginning of this thread, if that would see the light.
The only practical way will IMO be to have whatever new plugin it also have an out-of-core version maintained for older Pg versions, where it can be installed.
But only in-core plugins help for general-purpose solutions.
I still don't agree there. If there's enough need/interest/adoption you can get cloud vendors on board, they'll feel the customer pressure. It's not our job to create that pressure and do their work for them.
I see nothing wrong with a plugin starting off out of core and being adopted+adapted later, assuming it's done well.
That said, I'm all in favour of a generic json output plugin that shares infrastructure with logical replication, so people who are on inflexible environments have a fallback option. I just don't care to write it.
On 26/09/17 10:03, Craig Ringer wrote:
On 26 September 2017 at 14:08, Alvaro Hernandez <aht@ongres.com> wrote:
OK, let me try to do that. I believe data integration is a priority.Definitely agree so far.- If you want to develop your own output plugin, then your market is reduced as you have to exclude all the managed solutions (until, and only if, you would convince them to include your plugin... highly unlikely, very difficult). And probably another % of enterprise environments which will hesitate to link your own plugin to the running production database. Last but not least, you need to compile and test (and testing this is far from easy) on multiple OSes/architectures.Right. You probably don't want one output plugin per application.This doesn't mean it has to be in-core, just flexible and share-able by many, and adopted by cloud vendors. Like say PostGIS.
That would be nice. But this is chicken-and-egg: an out-of-core plugin won't probably be used by many if applications like the ones I was mentioning if they do not exist. And developing such an application is so much less interesting if a significant part of your market is excluded from your app.
However, it could work the other way around: a sufficiently good enough in-core base plugin could foster applications/ecosystem, which once adopted by users could push much more easily for other more advanced out-of-core plugins, that would be more easily accepted by pressure as those tools would already be with significant traction. But I don't see it the other way around.
- If you stick to in-core plugins, then you need to support at least three different output formats if you want to support 9.4+: test_decoding (and pray it works!), pgoutput, and the "new" in-core plugin that was proposed at the beginning of this thread, if that would see the light.The only practical way will IMO be to have whatever new plugin it also have an out-of-core version maintained for older Pg versions, where it can be installed.But only in-core plugins help for general-purpose solutions.I still don't agree there. If there's enough need/interest/adoption you can get cloud vendors on board, they'll feel the customer pressure. It's not our job to create that pressure and do their work for them.
Don't want to get into a loop, but as I said before it's chicken-and-egg. But nobody is asking core to do their work. As much as I love it, I think logical decoding is a bit half-baked until there is a single, quality, in-core plugin, as it discourages its usage, because of the reasons I stated.
I see nothing wrong with a plugin starting off out of core and being adopted+adapted later, assuming it's done well.That said, I'm all in favour of a generic json output plugin that shares infrastructure with logical replication, so people who are on inflexible environments have a fallback option. I just don't care to write it.
That's better than nothing. But as much as interoperable json may be, people still need to talk the (binary) replication protocol to use it. So once you talk binary protocol, why not talk binary also for the output plugin and have a much more efficient output? Again, nothing against json, but if a new plugin would be included in-core, I'd say json + binary also. Or just document pgoutput, as it could be good enough.
Cheers,
Álvaro
-- Alvaro Hernandez ----------- OnGres
On 26 September 2017 at 15:26, Alvaro Hernandez <aht@ongres.com> wrote:
That's better than nothing. But as much as interoperable json may be, people still need to talk the (binary) replication protocol to use it.
No, they don't.
They can use the SQL interface to logical decoding.
We could enhance that with a few small changes to make it a lot more useful too. Most importantly, allow a long-polling model, where you can wait if there's nothing to consume rather than getting an immediate empty result-set.
I expect the walsender protocol to be dealt with by client drivers, not user applications, much like you never really deal with the regular libpq protocol in your app. PgJDBC and psycopg2 already support it. It'd be nice if libpq offered some helper interfaces for C apps, and I'd help review any such patch.
Nothing against binary output, but as you noted, we can likely use pgoutput for that to some extent at least. I think the pressing need is json, going by the zillion plugins out there for it.
On Tue, Sep 26, 2017 at 7:42 AM, Alvaro Hernandez <aht@ongres.com> wrote:
On 25/09/17 22:13, Magnus Hagander wrote:On Mon, Sep 25, 2017 at 8:20 PM, Alvaro Hernandez <aht@ongres.com> wrote:
On 25/09/17 20:18, Andres Freund wrote:On 2017-09-24 13:36:56 +0300, Alvaro Hernandez wrote:However, if DMS uses it for what I'd call production use, I assume it isYou've been in software for how long? ... ;) There's quite mixed
actually production quality. I bet they do enough testing, and don't ship
software to potentially millions of customers if it doesn't work well. So...
first, I'd consider this a a sign of robustness.
experiences with DMS.
Actually long enough to understand that if someone "big" calls it production quality, we should not be pickier and assume it is --whether it is or not. People will accept it as such, and that's good enough.Historically the fact that we have been pickier than many of the "someone big":s is exactly how we ended up with the codebase and relative stability we have today.Just because someone is big doesn't mean they know what's right. In fact, more often than not the opposite turns out to be true.
Note that I'm not here supporting test_decoding. I'm just saying is all what is available in-core for 9.4-9.6, and it seems someone with potentially a lot of users tested it and is using it in its own solution. Ask me if I would like an in-core, well tested, performant, with an easy to parse format, and efficient, for 9.4-9.6? My answer would be an immediate 'yes'. But since this is not going to happen, test_decoding is good that is good enough, lucky us, because otherwise there would not be a good solution on this front.
I am not saying we shouldn't have that. I am saying that the argument "if someone big calls it production quality, we should not be pickier and assume it is" is incorrect.
And yes, I have used test_decoding in production multiple times. And yes, there are good reasons why it's called *test* decoding, and should only be used in production in fairly simple cases :)
On 26/09/17 09:26, Alvaro Hernandez wrote: > On 26/09/17 10:03, Craig Ringer wrote: >> On 26 September 2017 at 14:08, Alvaro Hernandez <aht@ongres.com >> <mailto:aht@ongres.com>> wrote: >> - If you stick to in-core plugins, then you need to support at >> least three different output formats if you want to support 9.4+: >> test_decoding (and pray it works!), pgoutput, and the "new" >> in-core plugin that was proposed at the beginning of this thread, >> if that would see the light. >> >> >> The only practical way will IMO be to have whatever new plugin it also >> have an out-of-core version maintained for older Pg versions, where it >> can be installed. >> >> >> But only in-core plugins help for general-purpose solutions. >> >> >> I still don't agree there. If there's enough need/interest/adoption >> you can get cloud vendors on board, they'll feel the customer >> pressure. It's not our job to create that pressure and do their work >> for them. > > Don't want to get into a loop, but as I said before it's > chicken-and-egg. But nobody is asking core to do their work. As much as > I love it, I think logical decoding is a bit half-baked until there is a > single, quality, in-core plugin, as it discourages its usage, because of > the reasons I stated. > Well, in that case it's all good as PG10 has that. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 26/09/17 12:57, Petr Jelinek wrote: > On 26/09/17 09:26, Alvaro Hernandez wrote: >> On 26/09/17 10:03, Craig Ringer wrote: >>> On 26 September 2017 at 14:08, Alvaro Hernandez <aht@ongres.com >>> <mailto:aht@ongres.com>> wrote: >>> - If you stick to in-core plugins, then you need to support at >>> least three different output formats if you want to support 9.4+: >>> test_decoding (and pray it works!), pgoutput, and the "new" >>> in-core plugin that was proposed at the beginning of this thread, >>> if that would see the light. >>> >>> >>> The only practical way will IMO be to have whatever new plugin it also >>> have an out-of-core version maintained for older Pg versions, where it >>> can be installed. >>> >>> >>> But only in-core plugins help for general-purpose solutions. >>> >>> >>> I still don't agree there. If there's enough need/interest/adoption >>> you can get cloud vendors on board, they'll feel the customer >>> pressure. It's not our job to create that pressure and do their work >>> for them. >> Don't want to get into a loop, but as I said before it's >> chicken-and-egg. But nobody is asking core to do their work. As much as >> I love it, I think logical decoding is a bit half-baked until there is a >> single, quality, in-core plugin, as it discourages its usage, because of >> the reasons I stated. >> > Well, in that case it's all good as PG10 has that. > Even though it's not fully documented, I agree this could fulfill this gap for 10+ (I assume this plugin will be maintained onwards, at least to support logical replication). But what about earlier versions? Any chance it could be backported down to 9.4? If that would be acceptable, I could probably help/do that... Álvaro -- Alvaro Hernandez ----------- OnGres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 26/09/17 10:55, Craig Ringer wrote:
On 26 September 2017 at 15:26, Alvaro Hernandez <aht@ongres.com> wrote:
That's better than nothing. But as much as interoperable json may be, people still need to talk the (binary) replication protocol to use it.No, they don't.They can use the SQL interface to logical decoding.
AFAIK the SQL interface was also designed for testing, not for production use...
We could enhance that with a few small changes to make it a lot more useful too. Most importantly, allow a long-polling model, where you can wait if there's nothing to consume rather than getting an immediate empty result-set.
Oh, that's a completely different perspective! Do we have any support for long-polling style queries (not that I know of). It's indeed a cool thing: SQL queries that return "live" results as soon as they happen. I don't see this restricted to logical decoding. Actually, this is what PipelineDB, among other things, seem to do. +1 for that, but I believe it's a different story.
I expect the walsender protocol to be dealt with by client drivers, not user applications, much like you never really deal with the regular libpq protocol in your app. PgJDBC and psycopg2 already support it. It'd be nice if libpq offered some helper interfaces for C apps, and I'd help review any such patch.
Fair enough.
Álvaro
-- Alvaro Hernandez ----------- OnGres
On Tue, Sep 26, 2017 at 2:16 PM, Alvaro Hernandez <aht@ongres.com> wrote:
The likelihood is zero if you mean backported into core of earlier versions.
If you mean backported as a standalone extension that could be installed on a previous version, probably. I'm not sure if it relies on any internals not present before that would make it harder, but it would probably at least be possible.
On 26/09/17 12:57, Petr Jelinek wrote:On 26/09/17 09:26, Alvaro Hernandez wrote:On 26/09/17 10:03, Craig Ringer wrote:Well, in that case it's all good as PG10 has that.On 26 September 2017 at 14:08, Alvaro Hernandez <aht@ongres.comDon't want to get into a loop, but as I said before it's
<mailto:aht@ongres.com>> wrote:
- If you stick to in-core plugins, then you need to support at
least three different output formats if you want to support 9.4+:
test_decoding (and pray it works!), pgoutput, and the "new"
in-core plugin that was proposed at the beginning of this thread,
if that would see the light.
The only practical way will IMO be to have whatever new plugin it also
have an out-of-core version maintained for older Pg versions, where it
can be installed.
But only in-core plugins help for general-purpose solutions.
I still don't agree there. If there's enough need/interest/adoption
you can get cloud vendors on board, they'll feel the customer
pressure. It's not our job to create that pressure and do their work
for them.
chicken-and-egg. But nobody is asking core to do their work. As much as
I love it, I think logical decoding is a bit half-baked until there is a
single, quality, in-core plugin, as it discourages its usage, because of
the reasons I stated.
Even though it's not fully documented, I agree this could fulfill this gap for 10+ (I assume this plugin will be maintained onwards, at least to support logical replication).
But what about earlier versions? Any chance it could be backported down to 9.4? If that would be acceptable, I could probably help/do that...
The likelihood is zero if you mean backported into core of earlier versions.
If you mean backported as a standalone extension that could be installed on a previous version, probably. I'm not sure if it relies on any internals not present before that would make it harder, but it would probably at least be possible.
2017-09-26 2:46 GMT-03:00 Alvaro Hernandez <aht@ongres.com>: > I think that's awesome. Now... back to my original question: what is the > *list* of output plugins supported by managed PostgreSQL solutions? So far > it looks like wal2json for 9.5-9.6 on RDS, and nothing else (it may just be > not complete, but in the best case this list won't be unfortunately > long...). > I can tell by other plugin authors but wal2json for being one of the first plugins available, it is pretty popular. I've heard that it is used in replication and audit solutions. It doesn't support all of the logical decoding features (for example, origin filter) but it is in my roadmap. -- Euler Taveira Timbira - http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 26 September 2017 at 22:14, Magnus Hagander <magnus@hagander.net> wrote:
On Tue, Sep 26, 2017 at 2:16 PM, Alvaro Hernandez <aht@ongres.com> wrote:
But what about earlier versions? Any chance it could be backported down to 9.4? If that would be acceptable, I could probably help/do that...
The likelihood is zero if you mean backported into core of earlier versions.
Right. We don't add features to back branches.
If you mean backported as a standalone extension that could be installed on a previous version, probably. I'm not sure if it relies on any internals not present before that would make it harder, but it would probably at least be possible.
All the pub/sub stuff is new and hooked into syscache etc. So you'd be doing a bunch of emulation/shims using user catalogs. Not impossible, but probably irritating and verbose. And you'd have none of the DDL required to manage it, so you'd need SQL-function equivalents.
I suspect you'd be better off tweaking pglogical to speak the same protocol as pg10, since the pgoutput protocol is an evolution of pglogical's protocol. Then using pglogical on older versions.
On 26/09/17 17:50, Craig Ringer wrote:
On 26 September 2017 at 22:14, Magnus Hagander <magnus@hagander.net> wrote:On Tue, Sep 26, 2017 at 2:16 PM, Alvaro Hernandez <aht@ongres.com> wrote:
But what about earlier versions? Any chance it could be backported down to 9.4? If that would be acceptable, I could probably help/do that...
The likelihood is zero if you mean backported into core of earlier versions.Right. We don't add features to back branches.
Yeah, I know the policy. But asking is free ;) and in my opinion this would be a very good reason to have an exception, if there would be a clear desire to have a single, unified, production quality output plugin across all PostgreSQL versions. At least I tried ;)
If you mean backported as a standalone extension that could be installed on a previous version, probably. I'm not sure if it relies on any internals not present before that would make it harder, but it would probably at least be possible.All the pub/sub stuff is new and hooked into syscache etc. So you'd be doing a bunch of emulation/shims using user catalogs. Not impossible, but probably irritating and verbose. And you'd have none of the DDL required to manage it, so you'd need SQL-function equivalents.I suspect you'd be better off tweaking pglogical to speak the same protocol as pg10, since the pgoutput protocol is an evolution of pglogical's protocol. Then using pglogical on older versions.
Given all this, if I would be doing an app based on logical decoding, I think I will stick to test_decoding for <10....
Thanks,
Álvaro
-- Alvaro Hernandez ----------- OnGres
It seems test_decoding.c could be easily changed to support JSON by using the built in PostgreSQL functions (json.c composite_to_json) to convert a Datum into SQL. It's use of OidOutputFunctionCall could be modified to emit arrays and composite types as JSON. This might be enough to enable downstream frameworks to parse (without having to code to the terse and positional composite structure format).
It could be a minimal change to have in core using the built in JSON support with no additional libraries. I have not made changes to this code but it seems like it should work.
It could be a minimal change to have in core using the built in JSON support with no additional libraries. I have not made changes to this code but it seems like it should work.
Thank you,
Henry
On Tue, Sep 26, 2017 at 9:37 AM Alvaro Hernandez <aht@ongres.com> wrote:
On 26/09/17 17:50, Craig Ringer wrote:On 26 September 2017 at 22:14, Magnus Hagander <magnus@hagander.net> wrote:On Tue, Sep 26, 2017 at 2:16 PM, Alvaro Hernandez <aht@ongres.com> wrote:
But what about earlier versions? Any chance it could be backported down to 9.4? If that would be acceptable, I could probably help/do that...
The likelihood is zero if you mean backported into core of earlier versions.Right. We don't add features to back branches.Yeah, I know the policy. But asking is free ;) and in my opinion this would be a very good reason to have an exception, if there would be a clear desire to have a single, unified, production quality output plugin across all PostgreSQL versions. At least I tried ;)
If you mean backported as a standalone extension that could be installed on a previous version, probably. I'm not sure if it relies on any internals not present before that would make it harder, but it would probably at least be possible.All the pub/sub stuff is new and hooked into syscache etc. So you'd be doing a bunch of emulation/shims using user catalogs. Not impossible, but probably irritating and verbose. And you'd have none of the DDL required to manage it, so you'd need SQL-function equivalents.I suspect you'd be better off tweaking pglogical to speak the same protocol as pg10, since the pgoutput protocol is an evolution of pglogical's protocol. Then using pglogical on older versions.Given all this, if I would be doing an app based on logical decoding, I think I will stick to test_decoding for <10....
Thanks,
Álvaro-- Alvaro Hernandez ----------- OnGres
Reviving this thread.
On Tue, 26 Sep 2017 at 13:57, Henry <henrymanmail@gmail.com> wrote:
It seems test_decoding.c could be easily changed to support JSON by using the built in PostgreSQL functions (json.c composite_to_json) to convert a Datum into SQL. It's use of OidOutputFunctionCall could be modified to emit arrays and composite types as JSON. This might be enough to enable downstream frameworks to parse (without having to code to the terse and positional composite structure format).
It could be a minimal change to have in core using the built in JSON support with no additional libraries. I have not made changes to this code but it seems like it should work.Thank you,HenryOn Tue, Sep 26, 2017 at 9:37 AM Alvaro Hernandez <aht@ongres.com> wrote:On 26/09/17 17:50, Craig Ringer wrote:On 26 September 2017 at 22:14, Magnus Hagander <magnus@hagander.net> wrote:On Tue, Sep 26, 2017 at 2:16 PM, Alvaro Hernandez <aht@ongres.com> wrote:
But what about earlier versions? Any chance it could be backported down to 9.4? If that would be acceptable, I could probably help/do that...
The likelihood is zero if you mean backported into core of earlier versions.Right. We don't add features to back branches.Yeah, I know the policy. But asking is free ;) and in my opinion this would be a very good reason to have an exception, if there would be a clear desire to have a single, unified, production quality output plugin across all PostgreSQL versions. At least I tried ;)
If you mean backported as a standalone extension that could be installed on a previous version, probably. I'm not sure if it relies on any internals not present before that would make it harder, but it would probably at least be possible.All the pub/sub stuff is new and hooked into syscache etc. So you'd be doing a bunch of emulation/shims using user catalogs. Not impossible, but probably irritating and verbose. And you'd have none of the DDL required to manage it, so you'd need SQL-function equivalents.I suspect you'd be better off tweaking pglogical to speak the same protocol as pg10, since the pgoutput protocol is an evolution of pglogical's protocol. Then using pglogical on older versions.Given all this, if I would be doing an app based on logical decoding, I think I will stick to test_decoding for <10....
Thanks,
Álvaro-- Alvaro Hernandez ----------- OnGres
I believe there is a valid reason for providing a reasonably feature complete plugin in core. Specifically in instances such as cloud providers where the user does not control what is installed on the server it would be useful to have a decent output plugin.
Having had a cursory look at pgoutput I see no reason why pgoutput could not be used as general purpose output plugin.
One thing that would be nice is to remove the requirement for a publication as creating a publication on all tables requires a superuser.
I'm also curious why pgoutput does not send attributes in binary ? This seems like a rather small change that should provide some significant performance benefits.