Thread: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility
Hallo postgresql and replication hackers This mail is an additional RFC which proposes a simple way to extend the new logical replication feature so it can cover most usages of skytools/pgq/londiste While the current work for BDR/LCR (bi-directional replication/logical replication) using WAL is theoretically enought to cover _replication_ offered by Londiste it falls short in one important way - there is currently no support for pure queueing, that is for "streams" of data which does not need to be stored in the source database. Fortunately there is a simple solution - do not store it in the source database :) The only thing needed for adding this is to have a table type which a) generates a INSERT record in WAL and b) does not actually store the data in a local file If implemented in userspace it would be a VIEW (or table) with a before/instead trigger which logs the inserted data and then cancels the insert. I'm sure this thing could be implemented, but I leave the tech discussion to those who are currently deep in WAL generation/reconstruction . If we implement logged only tables / queues we would not only enable a more performant pgQ replacement for implementing full Londiste / skytools functionality but would also become a very strong player to be used as persistent basis for message queueing solutions like ActiveMQ, StorMQ, any Advanced Message Queuing Protocol (AMQP) and so on. comments ? Hannu Krosing
On 16 October 2012 09:56, Hannu Krosing <hannu@2ndquadrant.com> wrote: > Hallo postgresql and replication hackers > > This mail is an additional RFC which proposes a simple way to extend the > new logical replication feature so it can cover most usages of > skytools/pgq/londiste > > While the current work for BDR/LCR (bi-directional replication/logical > replication) using WAL is theoretically enought to cover _replication_ offered by > Londiste it falls short in one important way - there is currently no support for pure > queueing, that is for "streams" of data which does not need to be stored in the source > database. > > Fortunately there is a simple solution - do not store it in the source > database :) > > The only thing needed for adding this is to have a table type which > > a) generates a INSERT record in WAL > > and > > b) does not actually store the data in a local file > > If implemented in userspace it would be a VIEW (or table) with a > before/instead > trigger which logs the inserted data and then cancels the insert. > > I'm sure this thing could be implemented, but I leave the tech discussion to > those who are currently deep in WAL generation/reconstruction . > > If we implement logged only tables / queues we would not only enable a more > performant pgQ replacement for implementing full Londiste / skytools > functionality > but would also become a very strong player to be used as persistent basis > for message queueing solutions like ActiveMQ, StorMQ, any Advanced Message > Queuing Protocol (AMQP) and so on. Hmm, I was assuming that we'd be able to do that by just writing extra WAL directly. But now you've made me think about it, that would be very ugly. Doing it this was, as you suggest, would allow us to write WAL records for queuing/replication to specific queue ids. It also allows us to have privileges assigned. So this looks like a good idea and might even be possible for 9.3. I've got a feeling we may want the word QUEUE again in the future, so I think we should call this a MESSAGE QUEUE. CREATE MESSAGE QUEUE foo; DROP MESSAGE QUEUE foo; GRANT INSERT ON MESSAGE QUEUE foo TO ...; REVOKE INSERT ON MESSAGE QUEUE foo TO ...; Rules wouldn't. DELETE and UPDATE wouldn't work, nor would SELECT. Things for next release: Triggers, SELECT sees a stream of changes, CHECK clauses to constrain what can be written. One question: would we require the INSERT statement to parse against a tupledesc, or would it be just a single blob of TEXT or can we send any payload? I'd suggest just a single blob of TEXT, since that can be XML or JSON etc easily enough. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 10/16/2012 11:18 AM, Simon Riggs wrote: > On 16 October 2012 09:56, Hannu Krosing <hannu@2ndquadrant.com> wrote: >> Hallo postgresql and replication hackers >> >> This mail is an additional RFC which proposes a simple way to extend the >> new logical replication feature so it can cover most usages of >> skytools/pgq/londiste >> >> While the current work for BDR/LCR (bi-directional replication/logical >> replication) using WAL is theoretically enought to cover _replication_ offered by >> Londiste it falls short in one important way - there is currently no support for pure >> queueing, that is for "streams" of data which does not need to be stored in the source >> database. >> >> Fortunately there is a simple solution - do not store it in the source >> database :) >> >> The only thing needed for adding this is to have a table type which >> >> a) generates a INSERT record in WAL >> >> and >> >> b) does not actually store the data in a local file >> >> If implemented in userspace it would be a VIEW (or table) with a >> before/instead >> trigger which logs the inserted data and then cancels the insert. >> >> I'm sure this thing could be implemented, but I leave the tech discussion to >> those who are currently deep in WAL generation/reconstruction . >> >> If we implement logged only tables / queues we would not only enable a more >> performant pgQ replacement for implementing full Londiste / skytools >> functionality >> but would also become a very strong player to be used as persistent basis >> for message queueing solutions like ActiveMQ, StorMQ, any Advanced Message >> Queuing Protocol (AMQP) and so on. > > Hmm, I was assuming that we'd be able to do that by just writing extra > WAL directly. But now you've made me think about it, that would be > very ugly. > > Doing it this was, as you suggest, would allow us to write WAL records > for queuing/replication to specific queue ids. It also allows us to > have privileges assigned. So this looks like a good idea and might > even be possible for 9.3. > > I've got a feeling we may want the word QUEUE again in the future, so > I think we should call this a MESSAGE QUEUE. > > CREATE MESSAGE QUEUE foo; > DROP MESSAGE QUEUE foo; I would like this to be very similar to a table, so it would be CREATE MESSAGE QUEUE(fieldname type, ...) foo; perhaps even allowing defaults and constraints. again, this depends on how complecxt the implementation would be. for the receiving side it would look like a table with only inserts, and in this case there could even be a possibility to use it as a remote log table. > > GRANT INSERT ON MESSAGE QUEUE foo TO ...; > REVOKE INSERT ON MESSAGE QUEUE foo TO ...; > > Rules wouldn't. DELETE and UPDATE wouldn't work, nor would SELECT. > > Things for next release: Triggers, SELECT sees a stream of changes, > CHECK clauses to constrain what can be written. > > One question: would we require the INSERT statement to parse against a > tupledesc, or would it be just a single blob of TEXT or can we send > any payload? I'd suggest just a single blob of TEXT, since that can be > XML or JSON etc easily enough. >
On 16 October 2012 10:29, Hannu Krosing <hannu@2ndquadrant.com> wrote: > I would like this to be very similar to a table, so it would be > > CREATE MESSAGE QUEUE(fieldname type, ...) foo; > > perhaps even allowing defaults and constraints. again, this > depends on how complecxt the implementation would be. Presumably just CHECK constraints, not UNIQUE or FKs. Indexes would not be allowed. > for the receiving side it would look like a table with only inserts, > and in this case there could even be a possibility to use it as > a remote log table. The queue data would be available via the API, so it can look like anything. It would be good to identify this with a new rmgr id. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 10/16/2012 11:29 AM, Hannu Krosing wrote: > On 10/16/2012 11:18 AM, Simon Riggs wrote: >> On 16 October 2012 09:56, Hannu Krosing <hannu@2ndquadrant.com> wrote: >>> Hallo postgresql and replication hackers >>> >>> This mail is an additional RFC which proposes a simple way to extend >>> the >>> new logical replication feature so it can cover most usages of >>> skytools/pgq/londiste >>> >>> While the current work for BDR/LCR (bi-directional replication/logical >>> replication) using WAL is theoretically enought to cover >>> _replication_ offered by >>> Londiste it falls short in one important way - there is currently no >>> support for pure >>> queueing, that is for "streams" of data which does not need to be >>> stored in the source >>> database. >>> >>> Fortunately there is a simple solution - do not store it in the source >>> database :) >>> >>> The only thing needed for adding this is to have a table type which >>> >>> a) generates a INSERT record in WAL >>> >>> and >>> >>> b) does not actually store the data in a local file >>> >>> If implemented in userspace it would be a VIEW (or table) with a >>> before/instead >>> trigger which logs the inserted data and then cancels the insert. >>> >>> I'm sure this thing could be implemented, but I leave the tech >>> discussion to >>> those who are currently deep in WAL generation/reconstruction . >>> >>> If we implement logged only tables / queues we would not only enable >>> a more >>> performant pgQ replacement for implementing full Londiste / skytools >>> functionality >>> but would also become a very strong player to be used as persistent >>> basis >>> for message queueing solutions like ActiveMQ, StorMQ, any Advanced >>> Message >>> Queuing Protocol (AMQP) and so on. >> >> Hmm, I was assuming that we'd be able to do that by just writing extra >> WAL directly. But now you've made me think about it, that would be >> very ugly. >> >> Doing it this was, as you suggest, would allow us to write WAL records >> for queuing/replication to specific queue ids. It also allows us to >> have privileges assigned. So this looks like a good idea and might >> even be possible for 9.3. >> >> I've got a feeling we may want the word QUEUE again in the future, so >> I think we should call this a MESSAGE QUEUE. >> >> CREATE MESSAGE QUEUE foo; >> DROP MESSAGE QUEUE foo; > I would like this to be very similar to a table, so it would be > > CREATE MESSAGE QUEUE(fieldname type, ...) foo; > > perhaps even allowing defaults and constraints. again, this > depends on how complecxt the implementation would be. > > for the receiving side it would look like a table with only inserts, > and in this case there could even be a possibility to use it as > a remote log table. To clarify - this is intended to be a mirror image of UNLOGGED table That is , as much as possible a full table, except that no data gets written, which means that a) indexes do not make any sense b) exclusion and unique constraints dont make any sense c) select, update and delete always see an empty table all these should probably throw and error, analogous to how VIEWs currently work. It could be also described as a write-only table, except that it is possible to materialise it as a real table on the receiving side > >> >> GRANT INSERT ON MESSAGE QUEUE foo TO ...; >> REVOKE INSERT ON MESSAGE QUEUE foo TO ...; >> >> Rules wouldn't. DELETE and UPDATE wouldn't work, nor would SELECT. >> >> Things for next release: Triggers, SELECT sees a stream of changes, >> CHECK clauses to constrain what can be written. >> >> One question: would we require the INSERT statement to parse against a >> tupledesc, or would it be just a single blob of TEXT or can we send >> any payload? I'd suggest just a single blob of TEXT, since that can be >> XML or JSON etc easily enough. >> > > >
Hannu, Can you explain in more detail how this would be used on the receiving side? I'm unable to picture it from your description. I'm also a bit reluctant to call this a "message queue", since it lacks the features required for it to be used as an application-level queue. "REPLICATION MESSAGE", maybe? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 16 October 2012 23:03, Josh Berkus <josh@agliodbs.com> wrote: > Can you explain in more detail how this would be used on the receiving > side? I'm unable to picture it from your description. This will allow implementation of pgq in core, as discussed many times at cluster hackers meetings. > I'm also a bit reluctant to call this a "message queue", since it lacks > the features required for it to be used as an application-level queue. It's the input end of an application-level queue. In this design the queue is like a table, so we need SQL grammar to support this new type of object. Replication message doesn't describe this, since it has little if anything to do with replication and if anything its a message type, not a message. You're right that Hannu needs to specify the rest of the design and outline the API. The storage of the queue is "in WAL", which raises questions about how the API will guarantee we read just once from the queue and what happens when queue overflows. The simple answer would be we put everything in a table somewhere else, but that needs more careful specification to show we have both ends of the queue and a working design. Do we need a new object at all? Can we not just define a record type, then define messages using that type? At the moment I think the named-object approach works better, but we should consider that. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 10/17/2012 12:03 AM, Josh Berkus wrote: > Hannu, > > Can you explain in more detail how this would be used on the receiving > side? I'm unable to picture it from your description. It would be used similar to how the event tables in pgQ (from skytools) is used - as a source of "events" to be replied on the subscriber side. (For discussion sake let's just call this LOGGED ONLY TABLE, as opposed to UNLOGGED TABLE we already have) The simplest usage would be implementing "remote log tables" that is tables, where you do INSERT on the master side, but it "inserts" only a logical WAL record and nothing else. On subscriber side your replay process reads this WAL record as an "insert event" and if the table is declared as an ordinary table on subscriber, it performs an insert there. This would make it trivial to implement a persistent remote log table with minimal required amount of writing on the master side. We could even implement a log table which captures also log entries from aborted transactions by treating ROLLBACK as COMMIT for this table. But the subscriber side could also do other things instead (or in addition to) filling a log table. For example, it could create a partitioned table instead of a plain table defined on the provider side. There is support and several example replay agents in skytools package which do this based on pgQ Or you could do computations/materialised views based on "events" from the table. Or you could use the "insert events"/wal records as a base for some other remote processing, like sending out e-mails . There is also support for these kinds of things in skytools. > I'm also a bit reluctant to call this a "message queue", since it lacks > the features required for it to be used as an application-level queue. > "REPLICATION MESSAGE", maybe? > Initially I'd just stick with LOG ONLY TABLE or QUEUE based on what it does, not on how it could be used. LOGGED ONLY TABLE is very technical description of realisation - I'd prefer it to work as mush like a table as possible, similar to how VIEW currently works - for all usages that make sense, you can simply substitute it for a TABLE QUEUE emphasizes the aspect of logged only table that it accepts "records" in a certain order, persists these and then quarantees that they can be read out in exact the same order - all this being guaranteed by existing WAL mechanisms. It is not meant to be a full implementation of application level queuing system though but just the capture, persisting and distribution parts Using this as an "application level queue" needs a set of interface functions to extract the events and also to keep track of the processed events. As there is no general consensus what these shoul be (like if processing same event twice is allowed) this part is left for specific queue consumer implementations. -------------------- Hannu Krosing
On Wed, Oct 17, 2012 at 11:26 AM, Hannu Krosing <hannu@2ndquadrant.com> wrote: > The simplest usage would be implementing "remote log tables" that is > tables, where you do INSERT on the master side, but it "inserts" only > a logical WAL record and nothing else. > > On subscriber side your replay process reads this WAL record as an > "insert event" and if the table is declared as an ordinary table on > subscriber, it performs an insert there. What kinds of applications would need that? -- greg
Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility
From
Christopher Browne
Date:
<p>Well, replication is arguably a relevant case.<p>For Slony, the origin/master node never cares about logged changes -that data is only processed on replicas. Now, that's certainly a little weaselly - the log data (sl_log_*) has got to getread to get to the replica.<p>This suggests, nonetheless, a curiously different table structure than is usual, and I couldsee this offering interesting possibilities.<p>The log tables are only useful to read in transaction order, which ispretty well the order data gets written to WAL, so perhaps we could have savings by only writing data to WAL...<p>It occursto me that this notion might exist as a special sort of table, interesting for pgq as well as Slony, which consistsof:<p>- table data is stored only in WAL<br /> - an index supports quick access to this data, residing in WAL<br/> - TOASTing perhaps unneeded?<br /> - index might want to be on additional attributes<br /> - the triggers-on-log-tablesthing Slony 2.2 does means we want these tables to support triggers<br /> - if data is only held inWAL, we need to hold the WAL until (mumble, later, when known to be replicated)<br /> - might want to mix local updateswith updates imported from remote nodes<p>I think it's a misnomer to think this is about having the data not locallyaccessible. Rather, it has a pretty curious access and storage pattern.<p>And a slick pgq queue would likely makea good Slony log, too.
> It is not meant to be a full implementation of application level queuing > system though but just the capture, persisting and distribution parts > > Using this as an "application level queue" needs a set of interface > functions to extract the events and also to keep track of the processed > events. As there is no general consensus what these shoul be (like if > processing same event twice is allowed) this part is left for specific > queue consumer implementations. Well, but AFAICT, you've already prohibited features through your design which are essential to application-level queues, and are implemented by, for example, pgQ. 1. your design only allows the queue to be read on replicas, not on the node where the item was inserted. 2. if you can't UPDATE or DELETE queue items -- or LOCK them -- how on earth would a client know which items they have executed and which they haven't? 3. Double-down on #2 in a multithreaded environment. For an application-level queue, the base functionality is: ADD ITEM READ NEXT (#) ITEM(S) LOCK ITEM DELETE ITEM More sophisticated an useful queues also allow: READ NEXT UNLOCKED ITEM LOCK NEXT UNLOCKED ITEM UPDATE ITEM READ NEXT (#) UNSEEN ITEM(S) The design you describe seems to prohibit pretty much all of the above operations after READ NEXT. This makes it completely useless as a application-level queue. And, for that matter, if your new queue only accepts INSERTs, why not just improve LISTEN/NOTIFY so that it's readable on replicas? What does this design buy you that that doesn't? Quite possibly you have plans which answer all of the above, but they aren't at all clear in your RFC. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility
From
Christopher Browne
Date:
On Wed, Oct 17, 2012 at 4:25 PM, Josh Berkus <josh@agliodbs.com> wrote: > >> It is not meant to be a full implementation of application level queuing >> system though but just the capture, persisting and distribution parts >> >> Using this as an "application level queue" needs a set of interface >> functions to extract the events and also to keep track of the processed >> events. As there is no general consensus what these shoul be (like if >> processing same event twice is allowed) this part is left for specific >> queue consumer implementations. > > Well, but AFAICT, you've already prohibited features through your design > which are essential to application-level queues, and are implemented by, > for example, pgQ. > > 1. your design only allows the queue to be read on replicas, not on the > node where the item was inserted. I commented separately on this; I'm pretty sure there needs to be a way to read the queue on a replica, yes, indeed. > 2. if you can't UPDATE or DELETE queue items -- or LOCK them -- how on > earth would a client know which items they have executed and which they > haven't? If the items are actually stored in WAL, then it seems well and truly impossible to do any of those three things directly. What could be done, instead, would be to add "successor" items to indicate that they have been dealt with, in effect, back-references. You don't get to UPDATE or DELETE; instead, you do something like: INSERT into queue (reference_to_xid, reference_to_id_in_xid, action) values (old_xid_1, old_id_within_xid_1, 'COMPLETED'),(old_xid_2, old_id_within_xid_2, 'CANCELLED'); In a distributed context, it's possible that multiple nodes could be reading from the same queue, so that while "process at least once" is no trouble, "process at most once" is just plain troublesome. -- When confronted by a difficult problem, solve it by reducing it to the question, "How would the Lone Ranger handle this?"
On 17 October 2012 21:25, Josh Berkus <josh@agliodbs.com> wrote: > >> It is not meant to be a full implementation of application level queuing >> system though but just the capture, persisting and distribution parts >> >> Using this as an "application level queue" needs a set of interface >> functions to extract the events and also to keep track of the processed >> events. As there is no general consensus what these shoul be (like if >> processing same event twice is allowed) this part is left for specific >> queue consumer implementations. > > Well, but AFAICT, you've already prohibited features through your design > which are essential to application-level queues, and are implemented by, > for example, pgQ. > > 1. your design only allows the queue to be read on replicas, not on the > node where the item was inserted. > > 2. if you can't UPDATE or DELETE queue items -- or LOCK them -- how on > earth would a client know which items they have executed and which they > haven't? > > 3. Double-down on #2 in a multithreaded environment. It's hard to work out how to reply to this because its just so off base. I don't agree with the restrictions you think you see at all, saying it politely rather than giving a one word answer. The problem here is you phrase these things with too much certainty, seeing only barriers. The "how on earth?" vibe is not appropriate at all. It's perfectly fine to ask for answers to those difficult questions, but don't presume that there are no answers, or that you know with certainty they are even hard ones. By phrasing things in such a closed way the only way forwards is through you, which does not help. All we're discussing is moving a successful piece of software into core, which has been discussed for years at the international technical meetings we've both been present at. I think an open viewpoint on the feasibility of that would be reasonable, especially when it comes from one of the original designers. I apologise for making a personal comment, but this does affect the technical discussion. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 17 October 2012 11:26, Hannu Krosing <hannu@2ndquadrant.com> wrote: > LOGGED ONLY TABLE is very technical description of realisation - I'd > prefer it to work as mush like a table as possible, similar to how VIEW > currently works - for all usages that make sense, you can simply > substitute it for a TABLE > > QUEUE emphasizes the aspect of logged only table that it accepts > "records" in a certain order, persists these and then quarantees > that they can be read out in exact the same order - all this being > guaranteed by existing WAL mechanisms. > > It is not meant to be a full implementation of application level queuing > system though but just the capture, persisting and distribution parts > > Using this as an "application level queue" needs a set of interface > functions to extract the events and also to keep track of the processed > events. As there is no general consensus what these shoul be (like if > processing same event twice is allowed) this part is left for specific > queue consumer implementations. The two halves of the queue are the TAIL/entry point and the HEAD/exit point. As you point out these could be on the different servers, wherever the logical changes flow to, but could also be on the same server. When the head and tail are on the same server, the MESSAGE QUEUE syntax seems appropriate, but I agree that calling it that when its just a head or just a tail seems slightly misleading. I guess the question is whether we provide a full implementation or just the first half. We do, I think, want a full queue implementation in core. We also want to allow other queue implementations to interface with Postgres, so we probably want to allow "first half" only as well. Meaning we want both head and tail separately in core code. The question is whether we require both head and tail in core before we allow commit, to which I would say I think adding the tail first is OK, and adding the head later when we know exactly the design. Having said that, the LOGGING ONLY syntax makes me shiver. Better name? I should also add that this is an switchable sync/asynchronous transactional queue, whereas LISTEN/NOTIFY is a synchronous transactional queue. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Simon, > It's hard to work out how to reply to this because its just so off > base. I don't agree with the restrictions you think you see at all, > saying it politely rather than giving a one word answer. You have inside knowledge of Hannu's design. I am merely going from his description *on this list*, because that's all I have to go in. He requested comments, so here I am, commenting. I'm *hoping* that it's merely the description which is poor and not the conception of the feature. *As Hannu described the feature* it sounds useless and obscure, and miles away from powering any kind of general queueing mechanism. Or anything we discussed at the clustering meetings. And, again, if you didn't want comments, you shouldn't have posted an RFC. > All we're discussing is moving a successful piece of software into > core, which has been discussed for years at the international > technical meetings we've both been present at. I think an open > viewpoint on the feasibility of that would be reasonable, especially > when it comes from one of the original designers. When I ask you for technical clarification or bring up potential problems with a 2Q feature, you consistently treat it as a personal attack and are emotionally defensive instead of answering my technical questions. This, in turn, frustrates the heck out of me (and others) because we can't get the technical questions answered. I don't want you to justify yourself, I want a clear technical spec. I'm asking these questions because I'm excited about ReplicationII, and I want it to be the best feature it can possibly be. Or, as we tell many new contributors, "We wouldn't bring up potential problems and ask lots of questions if we weren't interested in the feature." Now, on to the technical questions: >> QUEUE emphasizes the aspect of logged only table that it accepts >> "records" in a certain order, persists these and then quarantees >> that they can be read out in exact the same order - all this being >> guaranteed by existing WAL mechanisms. >> >> It is not meant to be a full implementation of application level queuing >> system though but just the capture, persisting and distribution parts >> >> Using this as an "application level queue" needs a set of interface >> functions to extract the events and also to keep track of the processed >> events. As there is no general consensus what these shoul be (like if >> processing same event twice is allowed) this part is left for specific >> queue consumer implementations. While implementations vary, I think you'll find that the set of operations required for a full-featured application queue are remarkably similar across projects. Personally, I've worked with celery, Redis, AMQ, and RabbitMQ, as well as a custom solution on top of pgQ. The design, as you've described it, make several of these requirements unreasonably convoluted to implement. It sounds to me like the needs of internal queueing and application queueing may be hopelessly divergent. That was always possible, and maybe the answer is to forget about application queueing and focus on making this mechanism work for replication and for matviews, the two features we *know* we want it for. Which don't need the application queueing features I described AFAIK. > The two halves of the queue are the TAIL/entry point and the HEAD/exit > point. As you point out these could be on the different servers, > wherever the logical changes flow to, but could also be on the same > server. When the head and tail are on the same server, the MESSAGE > QUEUE syntax seems appropriate, but I agree that calling it that when > its just a head or just a tail seems slightly misleading. Yeah, that's why I was asking for clarification; the way Hannu described it, it sounded like it *couldn't* be read on the insert node, but only on a replica. > We do, I think, want a full queue implementation in core. We also want > to allow other queue implementations to interface with Postgres, so we > probably want to allow "first half" only as well. Meaning we want both > head and tail separately in core code. The question is whether we > require both head and tail in core before we allow commit, to which I > would say I think adding the tail first is OK, and adding the head > later when we know exactly the design. I'm just pointing out that some of the requirements of the design for the replication queue may conflict with a design for a full-featured application queue. I don't quite follow you on what you mean by "head" vs. "tail". Explain? > Having said that, the LOGGING ONLY syntax makes me shiver. Better name? I suck at names. Sorry. > I should also add that this is an switchable sync/asynchronous > transactional queue, whereas LISTEN/NOTIFY is a synchronous > transactional queue. Thanks for explaining. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Thu, Oct 18, 2012 at 2:33 PM, Josh Berkus <josh@agliodbs.com> wrote: >> I should also add that this is an switchable sync/asynchronous >> transactional queue, whereas LISTEN/NOTIFY is a synchronous >> transactional queue. > > Thanks for explaining. New here, I missed half the conversation, but since it's been brought up and (to me wrongfully) dismissed, I'd like to propose: NOTIFY [ALL|ONE] [REMOTE|LOCAL|CLUSTER|DOWNSTREAM] ASYNCHRONOUSLY LISTEN [REMOTE|LOCAL|CLUSTER|UPSTREAM] too for good measure. That ought to work out fine as SQL constructs go, implementation aside. That's not enough for matviews, but it is IMO a good starting point. All you need after that, are triggers for notifying automatically upon insert, and some mechanism to attach triggers to a channel for the receiving side. Since channels are limited to short strings, maybe a different kind of object (but with similar manipulation syntax) ought to be created. The CREATE QUEUE command, in fact, could be creating such a channel. The channel itself won't be WAL-only, just the messages going through it. This (I think) would solve locking issues.
On 10/18/2012 07:33 PM, Josh Berkus wrote: > Simon, > > >> It's hard to work out how to reply to this because its just so off >> base. I don't agree with the restrictions you think you see at all, >> saying it politely rather than giving a one word answer. > You have inside knowledge of Hannu's design. Actually Simon has currently no more knowledge of this specific design than you do - I posted this on this list as soon as I had figured it out as a possible solution of a specific problem of supporting full pgQ/Londiste functionality in WAL based logical replication with minimal overhead. (well, actually I let it settle a few weeks, but i did not discuss this off-list before ). Simon may have better grasp of it thanks to having done work on the BDR/Logical Replication design and thus having better or at least more recent understanding of issues involved in Logical Replication. When mapping londiste/Slony message capture to Logical WAL the WAL already _is_ the event queue for replication. NOT LOGGED tables make it also usable for non-replication things using same mechanisms. (the equivalent in trigger-based system would be a log trigger which captures insert event and then cancels an insert). > I am merely going from his > description *on this list*, because that's all I have to go in. > > He requested comments, so here I am, commenting. I'm *hoping* that it's > merely the description which is poor and not the conception of the > feature. *As Hannu described the feature* it sounds useless and > obscure, and miles away from powering any kind of general queueing > mechanism. If we describe a queue as something you put stuff in at one end and get it out in same or some other specific order at the other end, then WAL _is_ a queue when you use it for replication (if you just write to it, then it is "Log", if you write and read, it is "Queue") That is, the WAL already is a form of persistent and ordered (that is how WAL works) stream of messages ("WAL records") that are generated on the "master" and replayed on one or more consumers (called "slaves" in case of simple replication) All it takes to make this scenario work is keeping track of LSN or simply log position on the slave side. What you seem to be wanting is support for a cooperative consumers, that is multiple consumers on the same queue working together and sharing the work to process the incoming event . This can be easily achieved using a single ordered event stream and extra bookkeeping structures on the consumer side (look at cooperative consumer samples in skytools). What I suggested was optimisation for the case where you know that you will never need the data on the master side and are only interested in it on the slave side. By writing rows/events/messages only to log (or steam or queue), you avoid the need to later clean up it on the master by either DELETE or TRUNCATE or rotating tables. For both physical and logical streaming the WAL _is_ the queue of events that were recorded on master and need to be replied on the slave. Thanks to introducing logical replication, it now makes sense to have actions recorded _only_ in this queue and this is what the whole RC was about. I recommend that you introduce yourself a bit to skytools/pgQ to get a better feel of the things I am talking about. Londiste is just one application built on a general event logging, transport and transform/replay (that is what i'd call queueing :) ) system pgQ. pgQ does have its roots in Slony an(and earlier) replication systems, but it is by no means _only_ a replication system. The LOG ONLY tables are _not_ needed for pure replication (like Slony) but they make replication + queueing type solutions like skytools/pgQ much more efficient as they do away wuth the need to maintainthe queued data on the master side where it will never be needed ( just to reapeat this once more ) > Or anything we discussed at the clustering meetings. > > And, again, if you didn't want comments, you shouldn't have posted an RFC. I did want comments and as far as I know I do not see you as hostile :) I do understand that what you mean by QUEUE (and specially as a MESSAGE QUEUE) is different from what I described. You seem to want specifically an implementation of cooperative consumers for a generic queue. The answer is yes, it is possible to build this on WAL, or table based event logs/queue of londiste / slony. It just takkes a little extra management on the receiving side to do the record locking and distribution between cooperating consumers. >> All we're discussing is moving a successful piece of software into >> core, which has been discussed for years at the international >> technical meetings we've both been present at. I think an open >> viewpoint on the feasibility of that would be reasonable, especially >> when it comes from one of the original designers. > When I ask you for technical clarification or bring up potential > problems with a 2Q feature, you consistently treat it as a personal > attack and are emotionally defensive instead of answering my technical > questions. This, in turn, frustrates the heck out of me (and others) > because we can't get the technical questions answered. I don't want you > to justify yourself, I want a clear technical spec. Currently the "clear tech spec" is just this: * works as table on INSERTS up to inserting logical WAL record describing the insert but no data is inserted locally. with all things that follow from the local table having no data - unique constraints don't make sense - indexes make nosense - updates and deletes hit no data - etc. . . > > I'm asking these questions because I'm excited about ReplicationII, and > I want it to be the best feature it can possibly be. > > Or, as we tell many new contributors, "We wouldn't bring up potential > problems and ask lots of questions if we weren't interested in the feature." > > Now, on to the technical questions: > >>> QUEUE emphasizes the aspect of logged only table that it accepts >>> "records" in a certain order, persists these and then quarantees >>> that they can be read out in exact the same order - all this being >>> guaranteed by existing WAL mechanisms. >>> >>> It is not meant to be a full implementation of application level queuing >>> system though but just the capture, persisting and distribution parts >>> >>> Using this as an "application level queue" needs a set of interface >>> functions to extract the events and also to keep track of the processed >>> events. As there is no general consensus what these shoul be (like if >>> processing same event twice is allowed) this part is left for specific >>> queue consumer implementations. > While implementations vary, I think you'll find that the set of > operations required for a full-featured application queue are remarkably > similar across projects. Personally, I've worked with celery, Redis, > AMQ, and RabbitMQ, as well as a custom solution on top of pgQ. The > design, as you've described it, make several of these requirements > unreasonably convoluted to implement. As Simon explained, the initial RFC was just about not keeping the data in local table if we know it will never be accessed (at leas not for anything except vacuum and delete/truncate) This is something that made no sense for physical replication . > It sounds to me like the needs of internal queueing and application > queueing may be hopelessly divergent. That was always possible, and > maybe the answer is to forget about application queueing and focus on > making this mechanism work for replication and for matviews, the two > features we *know* we want it for. Which don't need the application > queueing features I described AFAIK. > >> The two halves of the queue are the TAIL/entry point and the HEAD/exit >> point. As you point out these could be on the different servers, >> wherever the logical changes flow to, but could also be on the same >> server. When the head and tail are on the same server, the MESSAGE >> QUEUE syntax seems appropriate, but I agree that calling it that when >> its just a head or just a tail seems slightly misleading. > Yeah, that's why I was asking for clarification; the way Hannu described > it, it sounded like it *couldn't* be read on the insert node, but only > on a replica. Well, the reading is done the same way any WAL reading is done - you subscribe to the stream and from that point on get the records in LSN order. It is very hard for me to tell for sure if walsender->walreceiver combo "reads the events" on master or slave side > >> We do, I think, want a full queue implementation in core. We also want >> to allow other queue implementations to interface with Postgres, so we >> probably want to allow "first half" only as well. Meaning we want both >> head and tail separately in core code. The question is whether we >> require both head and tail in core before we allow commit, to which I >> would say I think adding the tail first is OK, and adding the head >> later when we know exactly the design. > I'm just pointing out that some of the requirements of the design for > the replication queue may conflict with a design for a full-featured > application queue. > > I don't quite follow you on what you mean by "head" vs. "tail". Explain? HEAD is the queue producer, where the events go in (any insert on master) TAIL (to avoid another word) is where they come out (walreader -> walreceiver moving the events to slave) Think of an analogy with a snake feeding on berries used by an ant colony to get the nutrients in the berries to its nest :) Ans there is no processing inside the snake - the work of distributing said nutrients once they have arrived to the nest has to be organised by the cooperative colony of ants on that end, the snake just guarantees that the berries arrive in the same order they get in. I guess this organisation of works after the events are delivered is what you were after when asking about "an application level queue". >> Having said that, the LOGGING ONLY syntax makes me shiver. Better name? > I guess WRITE ONLY tables would get us more publicity would not be entirely correct, as the data is readable from the log . Hannu
On 10/18/2012 08:36 PM, Claudio Freire wrote: > The CREATE QUEUE command, in fact, could be creating > such a channel. The channel itself won't be WAL-only, just > the messages going through it. This (I think) would solve locking issues. Hmm. Maybe we should think of implementing this as REMOTE TABLE, that is a table which gets no real data stored locally but all insert got through WAL and are replayed as real inserts on slave side. Then if you want matviews or partitioned table, you just attach triggers to the table on slave side to do them. This would be tangential to their use as pure queues which would happen at the level of plugins to logical replication. -------------- Hannu
Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility
From
Christopher Browne
Date:
On Thu, Oct 18, 2012 at 2:56 PM, Hannu Krosing <hannu@2ndquadrant.com> wrote: > * works as table on INSERTS up to inserting logical WAL record describing > the > insert but no data is inserted locally. > > with all things that follow from the local table having no data > - unique constraints don't make sense > - indexes make no sense > - updates and deletes hit no data > - etc. . . Yep, I think I was understanding those aspects. I think I disagree that "indexes make no sense." I think that it would be meaningful to have an index type for this, one that is a pointer at WAL records, to enable efficiently jumping to the right WAL log to start accessing a data stream, given an XID. That's a fundamentally different sort of index than we have today (much the way that hash indexes, GiST indexes, and BTrees differ from one another). I'm having a hard time thinking about what happens if you have cascaded replication, and want to carry records downstream. In that case, the XIDs from the original system aren't miscible with the XIDs in a message queue on a downstream database, and I'm not sure what we'd want to do. Keep the original XIDs in a side attribute, maybe? It seems weird, at any rate. Or perhaps data from foreign sources has got to go into a separate queue/'sorta-table', and thereby have two XIDs, the "source system XID" and the "when we loaded it in locally XID." -- When confronted by a difficult problem, solve it by reducing it to the question, "How would the Lone Ranger handle this?"
On Thu, Oct 18, 2012 at 10:03 PM, Hannu Krosing <hannu@2ndquadrant.com> wrote: > Hmm. Maybe we should think of implementing this as REMOTE TABLE, that > is a table which gets no real data stored locally but all insert got through > WAL > and are replayed as real inserts on slave side. FWIW, MySQL calls this exact concept the "black hole" storage engine. Regards, Ants Aasma -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de
On 10/19/2012 04:26 AM, Ants Aasma wrote: > On Thu, Oct 18, 2012 at 10:03 PM, Hannu Krosing <hannu@2ndquadrant.com> wrote: >> Hmm. Maybe we should think of implementing this as REMOTE TABLE, that >> is a table which gets no real data stored locally but all insert got through >> WAL >> and are replayed as real inserts on slave side. > FWIW, MySQL calls this exact concept the "black hole" storage engine. In this case calling this WRITE ONLY TABLE does not seem so strange anymore :) Or even PERSISTENT WRITE ONLY TABLE to make the paradox more explicit. > > Regards, > Ants Aasma
On 10/18/2012 09:18 PM, Christopher Browne wrote: > On Thu, Oct 18, 2012 at 2:56 PM, Hannu Krosing <hannu@2ndquadrant.com> wrote: >> * works as table on INSERTS up to inserting logical WAL record describing >> the >> insert but no data is inserted locally. >> >> with all things that follow from the local table having no data >> - unique constraints don't make sense >> - indexes make no sense >> - updates and deletes hit no data >> - etc. . . > Yep, I think I was understanding those aspects. > > I think I disagree that "indexes make no sense." > > I think that it would be meaningful to have an index type for this, > one that is a pointer at WAL records, to enable efficiently jumping to > the right WAL log to start accessing a data stream, given an XID. > That's a fundamentally different sort of index than we have today > (much the way that hash indexes, GiST indexes, and BTrees differ from > one another). > > I'm having a hard time thinking about what happens if you have > cascaded replication, and want to carry records downstream. I'd try to keep it as similar as possible to how the "real" tables behave in this multi-master (or "bidirectional" as the original logical wal case was named) scenario. I assume that the current thinking is that the replicated changes will carry original (node id, transaxtion id) info which is used to determine when to stop replicating in case there is more than one node in the replication ring. In case any changes to the resulting table are performed due to conflict resolution this "original (node id, transaxtion id)" gets replaced (or added ?) by the info from the node that did the latest changes so that the original origin node gets a chance to examine the changes too. This has to be pondered carefully so that the conflict resolution chain will end at some point. (I guess that the whole logrep design is something that should be discussed in Prague . Simon and Andres are doing a presentation on it there and in case this ignites more discussion it may be something warranting a separate discussion session among all interested parties) Hannu > In that > case, the XIDs from the original system aren't miscible with the XIDs > in a message queue on a downstream database, and I'm not sure what > we'd want to do. Keep the original XIDs in a side attribute, maybe? > It seems weird, at any rate. Or perhaps data from foreign sources has > got to go into a separate queue/'sorta-table', and thereby have two > XIDs, the "source system XID" and the "when we loaded it in locally > XID."
On 18 October 2012 18:33, Josh Berkus <josh@agliodbs.com> wrote: >> All we're discussing is moving a successful piece of software into >> core, which has been discussed for years at the international >> technical meetings we've both been present at. I think an open >> viewpoint on the feasibility of that would be reasonable, especially >> when it comes from one of the original designers. > > When I ask you for technical clarification or bring up potential > problems with a 2Q feature, you consistently treat it as a personal > attack and are emotionally defensive instead of answering my technical > questions. This, in turn, frustrates the heck out of me (and others) > because we can't get the technical questions answered. I don't want you > to justify yourself, I want a clear technical spec. Well, this isn't "a 2Q feature"; perhaps that is part of the problem, but I couldn't say. I didn't know this was coming at all, nor is that a problem for me. Since we've talked about that general feature enough at meetings we've all been present at (and indeed, you chaired), I recognised it as that and treated it positively in that light. (I think even that Hannu may not have been present, just Marko). You made claims that were completely unfounded and yet also strangely negative. I picked you up on it because you'll kill discussion of the feature if I don't speak out, not because the speaker works with me. I'm not otherwise involved in the feature. So your assumption of off-list collusion is wrong, as is your claim of any emotional aspect to this from me. I don't think you can turn this back onto me. If a design is not clear, ask for clarification. Don't tell the world in general that the design is bad or flawed until you actually know it is. I hear "there is a problem with that patch" discussed too often. Unfounded negativity is as certain a killer as any real technical flaw, so we must be careful to avoid it. That comment goes to everybody, for any patch, but in this case to you because this is the second thread this week I've seen it. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
<div id="yass_top_edge_dummy" style="width: 1px; height: 1px; padding: 0px; margin: -17px 0px 0px; border-width: 0px; display:block;"></div><div id="yass_top_edge" style="background-image: url("chrome://yass/content/edgebgtop.png"); background-attachment:scroll; background-position: center bottom; padding: 0px; margin: 0px 0px 16px -12px; border-width:0px; height: 0px; display: block; width: 990px;"></div><p style="font-family: Tahoma,sans-serif; font-size:13px; margin-top: 16px;"><span style="color:#999999;">----- Цитат от Hannu Krosing (hannu@krosing.net), на 19.10.2012в 14:17 -----</span><blockquote style="border:none; padding:0; margin:0; background-color:#1010ff;" type="cite"><divstyle="margin-left:2px; background-color:white;"><div style="margin-left:13px;"> On 10/19/2012 04:26 AM,Ants Aasma wrote:<br /><blockquote style="border:none; padding:0; margin:0; background-color:#1010ff;" type="cite"><divstyle="margin-left:2px; background-color:white;"><div style="margin-left:13px;">On Thu, Oct 18, 2012 at 10:03PM, Hannu Krosing <hannu@2ndquadrant.com> wrote:<br /><blockquote style="border:none; padding:0; margin:0; background-color:#1010ff;"type="cite"><div style="margin-left:2px; background-color:white;"><div style="margin-left:13px;">Hmm.Maybe we should think of implementing this as REMOTE TABLE, that<br />is a table which getsno real data stored locally but all insert got through<br />WAL<br />and are replayed as real inserts on slave side.<br/></div></div></blockquote>FWIW, MySQL calls this exact concept the "black hole" storage engine.<br /></div></div></blockquote>Inthis case calling this WRITE ONLY TABLE does not seem so strange <br />anymore :)<br /><br />Oreven PERSISTENT WRITE ONLY TABLE to make the paradox more explicit.<br /><br /></div></div></blockquote><p style="font-family:Tahoma,sans-serif; font-size:13px;"><p style="line-height: 16px; font-size: 13px; font-family: Tahoma,sans-serif;margin: 16px 0px;" valid="true">Oracle call this "Streams" and they build application queues ("Advancedqueuing") and replication solution ("Advanced replication") on them.<br /><p style="line-height: 16px; font-size:13px; font-family: Tahoma,sans-serif; margin: 16px 0px;" valid="true">Why not call the feature "STREAM TABLE"?<pstyle="line-height: 16px; font-size: 13px; font-family: Tahoma,sans-serif; margin: 16px 0px;" valid="true">Bestregards<br /><p style="line-height: 16px; font-size: 13px; font-family: Tahoma,sans-serif; margin: 16px0px;" valid="true">--<br />Luben Karavelov<div id="yass_bottom_edge" style="background-image: url("chrome://yass/content/edgebgbot.png");background-position: 0px 0px; position: absolute; margin: 0px; padding: 0px; border-width:0px; height: 0px; left: 0px; top: 384px; width: 100%; display: block;"></div>
> If we describe a queue as something you put stuff in at one end and > get it out in same or some other specific order at the other end, then > WAL _is_ a queue when you use it for replication (if you just write to it, > then it is "Log", if you write and read, it is "Queue") For that matter, WAL is a queue you use for recovery. But, for that matter, BerkeleyDB is a database just as PostgreSQL as a database. That doesn't mean you can use BerkeleyDB and PostgreSQL for all the same tasks. > All it takes to make this scenario work is keeping track of LSN or simply > log position on the slave side. > > What you seem to be wanting is support for a cooperative consumers, > that is multiple consumers on the same queue working together and > sharing the work to process the incoming event . > > This can be easily achieved using a single ordered event stream and > extra bookkeeping structures on the consumer side (look at cooperative > consumer samples in skytools). What I'm saying is, we'll get nowhere promoting an application queue which is permanently inferior to existing, popular open source software.My advice: Forget about the application queue aspectsof this. Focus on making it work for replication and matviews, which are already hard use cases to optimize. If someone can turn this feature into the base for a distributed queueing system later, then great. But let's not complicate this feature by worrying about a use case it may never fulfill. > Thanks to introducing logical replication, it now makes sense to have > actions recorded _only_ in this queue and this is what the whole RC was > about. Yes, I agree. I'm just pointing out that the needs of a replication queue and of an application queue are divergent. > Currently the "clear tech spec" is just this: > > * works as table on INSERTS up to inserting logical WAL record > describing the > insert but no data is inserted locally. Yeah, I think where you confused a bunch of people here is the definition of "locally". Let me see if I understand this: * a Writer would INSERT data into the LOG ONLY TABLE (L.O.T.), which write would be synched to WAL but there would be no in-memory or on-disk version of the table updated. * Readers could subscribe to the LSN for the L.O.T. and would receive a stream of INSERTs, which they could handle as they wished. Is my understanding correct? If it is, I have more questions! > with all things that follow from the local table having no data > - unique constraints don't make sense > - indexes make no sense > - updates and deletes hit no data > - etc. . . Right. > As Simon explained, the initial RFC was just about not keeping the > data in local table if we know it will never be accessed Ah, so to answer Simon's question: no, this RFC makes no sense without a description of expected Reader activity. > (at leas not > for anything except vacuum and delete/truncate) If the table is not being represented as a table in the catalog or on disk, why would it ever need to be vacuumed? > It is very hard for me to tell for sure if walsender->walreceiver combo > "reads the events" on master or slave side Well, presumably the only way a Reader on the master could get the queue would be for the master to subscribe to its own LSN. No? > HEAD is the queue producer, where the events go in (any insert on master) > > TAIL (to avoid another word) is where they come out > (walreader -> walreceiver moving the events to slave) BTW, I suggest using "Writer" and "Reader" for the queue roles, not "Head" and "Tail", which terms are rather unclear. > Think of an analogy with a snake feeding on berries used by > an ant colony to get the nutrients in the berries to its nest :) That's a very ... unique analogy. ;-) >>> Having said that, the LOGGING ONLY syntax makes me shiver. Better name? >> > I guess WRITE ONLY tables would get us more publicity would not be > entirely correct, as the data is readable from the log . I like LOG ONLY TABLES, actually; it's the mirror of UNLOGGED TABLEs. Or REPLICATION MESSAGE TABLE. Now, since I've pointed out what use case this mechanism does not apply to (replacing a generic application queue), let me point out some ones which it *does* apply to, and handily: * Updating matviews on a replica * Updating a cache (assuming an autonomous LSN reader) * Remote security logging (especially if combined with command triggers) -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 10/19/12 1:26 PM, Josh Berkus wrote: > What I'm saying is, we'll get nowhere promoting an application queue > which is permanently inferior to existing, popular open source software. > My advice: Forget about the application queue aspects of this. Focus > on making it work for replication and matviews, which are already hard > use cases to optimize. > > If someone can turn this feature into the base for a distributed > queueing system later, then great. But let's not complicate this > feature by worrying about a use case it may never fulfill. And as someone else mentioned... we should call this a stream and not a queue, since this would be lacking in many queuefeatures. It certainly sounds like a useful framework to have. -- Jim C. Nasby, Database Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
On Wed, Oct 17, 2012 at 7:48 PM, Christopher Browne <cbbrowne@gmail.com> wrote: > Well, replication is arguably a relevant case. > > For Slony, the origin/master node never cares about logged changes - that > data is only processed on replicas. Now, that's certainly a little weaselly > - the log data (sl_log_*) has got to get read to get to the replica. Well this is a clever way for Slony to use existing infrastructure to get data into the WAL. But wouldn't it be more logical for an in-core system to just annotate the existing records with enough information to replay them logically? Instead of synthesizing inserts into an imaginary table containing data that can be extracted to retrieve info about some other record, just add the info needed to the relevant record. The minimum needed for DML afaict is DELETE and UPDATE records need the primary key of the record being deleted and updated. It might make sense to include the whole tupledesc or at least key parts of it like the attlen and atttyp array so that replay can be more robust. But the logical place for this data, it seems to me, is *in* the update or insert record that already exists. Otherwise managing logical standbies will require a whole duplicate set of infrastructure to keep track of what has and hasn't been replayed. For instance what if an update record is covered by a checkpoint but the logical record falls after the checkpoint and the system crashes before writing it out? -- greg
On 10/23/2012 01:31 AM, Greg Stark wrote: > On Wed, Oct 17, 2012 at 7:48 PM, Christopher Browne <cbbrowne@gmail.com> wrote: >> Well, replication is arguably a relevant case. >> >> For Slony, the origin/master node never cares about logged changes - that >> data is only processed on replicas. Now, that's certainly a little weaselly >> - the log data (sl_log_*) has got to get read to get to the replica. > Well this is a clever way for Slony to use existing infrastructure to > get data into the WAL. But wouldn't it be more logical for an in-core > system to just annotate the existing records with enough information > to replay them logically? The QUEUE / LOG ONLY TABLES / WRITE ONLY TABLES :) proposal was _not_ for use in standard replication - it is already covered by what is being done - but for cases where the data is needed _only_ on the slave/replay side. One typical case is sending e-mail on some database actions, like sending a greeting or confirmation mail when creating a new user. On a busy system you often want to offload the things that can be done asynchronously to other hosts. My RFC was for a proposal to skip writing the unneeded info in local tables and put it _only_ in WAL. > Instead of synthesizing inserts into an > imaginary table containing data that can be extracted to retrieve info > about some other record, just add the info needed to the relevant > record. This is more or less how the current system is being designed, only the "add enough relevant info" part is offloaded to logical version of WALSender > The minimum needed for DML afaict is DELETE and UPDATE records need > the primary key of the record being deleted and updated. It might make > sense to include the whole tupledesc or at least key parts of it like > the attlen and atttyp array so that replay can be more robust. But the > logical place for this data, it seems to me, is *in* the update or > insert record that already exists. Otherwise managing logical > standbies will require a whole duplicate set of infrastructure to keep > track of what has and hasn't been replayed. For instance what if an > update record is covered by a checkpoint but the logical record falls > after the checkpoint and the system crashes before writing it out? > This complexity (which is really a lot more than you briefley described here) is the reason the construction of the "update records" from WAL records was moved back to master side. In original design it was hoped that it could be done all on slave by keeping an own time-synced copy of system catalog. Currently it seems to play out reasonably well, but I'd not completely rule out some new complexities arising which would force the creation of (more of the) full logical DML records as part of WAL. The downside would be performance, which for current case is mostly inaffected on the write side, but would be affected a lot more if the WAL volume had to increase significantly to accommodate all needed info for LogRep --------------- Hannu
[ hadn't been following this thread, sorry ] Hannu Krosing <hannu@2ndQuadrant.com> writes: > My RFC was for a proposal to skip writing the unneeded info in local > tables and put it _only_ in WAL. This concept seems fundamentally broken. What will happen if the master crashes immediately after emitting the WAL record? It will replay it locally, that's what, and thus you have uncertainty about whether the master will contain the data or not. regards, tom lane
On Wed, Oct 17, 2012 at 4:25 PM, Josh Berkus <josh@agliodbs.com> wrote: >> It is not meant to be a full implementation of application level queuing >> system though but just the capture, persisting and distribution parts >> >> Using this as an "application level queue" needs a set of interface >> functions to extract the events and also to keep track of the processed >> events. As there is no general consensus what these shoul be (like if >> processing same event twice is allowed) this part is left for specific >> queue consumer implementations. > > Well, but AFAICT, you've already prohibited features through your design > which are essential to application-level queues, and are implemented by, > for example, pgQ. > > 1. your design only allows the queue to be read on replicas, not on the > node where the item was inserted. > > 2. if you can't UPDATE or DELETE queue items -- or LOCK them -- how on > earth would a client know which items they have executed and which they > haven't? > > 3. Double-down on #2 in a multithreaded environment. > > For an application-level queue, the base functionality is: > > ADD ITEM > READ NEXT (#) ITEM(S) > LOCK ITEM > DELETE ITEM > > More sophisticated an useful queues also allow: > > READ NEXT UNLOCKED ITEM > LOCK NEXT UNLOCKED ITEM > UPDATE ITEM > READ NEXT (#) UNSEEN ITEM(S) > > The design you describe seems to prohibit pretty much all of the above > operations after READ NEXT. This makes it completely useless as a > application-level queue. > > And, for that matter, if your new queue only accepts INSERTs, why not > just improve LISTEN/NOTIFY so that it's readable on replicas? What does > this design buy you that that doesn't? I've read the whole thread, but I still don't see that anyone's said it better than this, and I agree with these comments. (I don't find them ad hominem, either.) It's also worth noting that in order to be useful, this feature intrinsically requires the logical replication stuff to be committed first. It's not entirely clear that there's not enough time to get logical replication committed for 9.3, and the chances of getting any follow-on features getting committed as well seems remote. Besides the shortness of the time, I think all experience has shown that it's best not to rush into the design of follow-on features before we've got the basic feature well nailed down. This certainly can't be said of logical replication at this point. Andres seems to be making good progress and I'm grateful for his work on it, but I think there's a lot left to do before that one is in the bag (as I think Andres would agree). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 10/23/2012 04:13 PM, Tom Lane wrote: > [ hadn't been following this thread, sorry ] > > Hannu Krosing <hannu@2ndQuadrant.com> writes: >> My RFC was for a proposal to skip writing the unneeded info in local >> tables and put it _only_ in WAL. > This concept seems fundamentally broken. What will happen if the master > crashes immediately after emitting the WAL record? It will replay it > locally, that's what, and thus you have uncertainty about whether the > master will contain the data or not. I agree that emitting a record indistinguishable from current insert record would probably be a bad idea as it would require the WAL replay to examine the table description to find that the corresponding table does not accept local data . It surely would be better to use a special record type so crash recovery on the master knows not to replay it. The syntax and mechanics of what would essentially be a simple QUEUEing feature being declared and defined in a similar way to a table were chosen for 2 reasons - * familiarity - easy to adapt * most structure can be shared with tables & views - easy to implement -------------------- Hannu > regards, tom lane > >
On 10/23/2012 06:47 PM, Robert Haas wrote: > On Wed, Oct 17, 2012 at 4:25 PM, Josh Berkus <josh@agliodbs.com> wrote: ... >> 3. Double-down on #2 in a multithreaded environment. >> >> For an application-level queue, the base functionality is: >> >> ADD ITEM >> READ NEXT (#) ITEM(S) >> LOCK ITEM >> DELETE ITEM >> >> More sophisticated an useful queues also allow: >> >> READ NEXT UNLOCKED ITEM >> LOCK NEXT UNLOCKED ITEM >> UPDATE ITEM >> READ NEXT (#) UNSEEN ITEM(S) >> >> The design you describe seems to prohibit pretty much all of the above >> operations after READ NEXT. This makes it completely useless as a >> application-level queue. By the above logic MVCC "prohibits" UPDATES and DELETES on table data ;) WAL-only tables/queues "prohobit" none of what you claim above, you just implement in a (loosely) MVCC way by keeping track of what events are processed. >> >> And, for that matter, if your new queue only accepts INSERTs, why not >> just improve LISTEN/NOTIFY so that it's readable on replicas? What does >> this design buy you that that doesn't? I get the ability to easily keep track of which events are already acted on and which are not. And you really can't fall back on processing LISTEN/NOTIFY - they come when they come. For WAL based event stream you only need to track LSN and in case of multiple cooperative consumers (which I think Josh meantby "multithreaded" above) a small structure to keep track of locking and event consumption while The WAL part takes care of consistency, order and durability. > I've read the whole thread, but I still don't see that anyone's said > it better than this, and I agree with these comments. (I don't find > them ad hominem, either.) > > It's also worth noting that in order to be useful, this feature > intrinsically requires the logical replication stuff to be committed > first. I agree that this feature - at least if implemented as proposed - does need some underlying features from the Logical Replication. Otoh it does not really _need_ to have full logical replication integrated - just having a special WAL type and easy way for your own WAL reader (something like pg_basebackup cold work well a a sample). Without WAL-based logical replication I already can do the same thing in a bit more expensive way by having a before trigger which logs the insert in Slony/Londiste style event table and then cancels it on the main table. > It's not entirely clear that there's not enough time to get > logical replication committed for 9.3, and the chances of getting any > follow-on features getting committed as well seems remote. Besides > the shortness of the time, I think all experience has shown that it's > best not to rush into the design of follow-on features before we've > got the basic feature well nailed down. This certainly can't be said > of logical replication at this point. Andres seems to be making good > progress and I'm grateful for his work on it, but I think there's a > lot left to do before that one is in the bag (as I think Andres would > agree). >
> WAL-only tables/queues "prohobit" none of what you claim above, you just > implement in a (loosely) MVCC way by keeping track of what events are > processed. Well, per our discussion here in person, I'm not convinced that this buys us anything in the "let's replace AMQ" case. However, as I pointed out in my last email, this feature doesn't need to replace AMQ to be useful. Let's focus on the original use case of supplying a queue which Londiste and Slony can use, which is a sufficient motivation to push the feature if the Slony and Londiste folks think it's good enough (and it seems that they do). -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
> Well, per our discussion here in person, I'm not convinced that this > buys us anything in the "let's replace AMQ" case. However, as I pointed > out in my last email, this feature doesn't need to replace AMQ to be > useful. Let's focus on the original use case of supplying a queue which > Londiste and Slony can use, which is a sufficient motivation to push the > feature if the Slony and Londiste folks think it's good enough (and it > seems that they do). BTW, I talked to Marko Kreen about this feature at the boat party, and he thought it would work for pgQ. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com