Re: PG-MQ? - Mailing list pgsql-hackers
From | Jeroen T. Vermeulen |
---|---|
Subject | Re: PG-MQ? |
Date | |
Msg-id | 7108.125.24.217.75.1182325557.squirrel@webmail.xs4all.nl Whole thread Raw |
In response to | PG-MQ? (Chris Browne <cbbrowne@acm.org>) |
Responses |
Re: PG-MQ?
|
List | pgsql-hackers |
On Wed, June 20, 2007 04:45, Chris Browne wrote: > I'm seeing some applications where it appears that there would be > value in introducing asynchronous messaging, ala "message queueing." > <http://en.wikipedia.org/wiki/Message_queue> > > The "granddaddy" of message queuing systems is IBM's MQ-Series, and I > don't see particular value in replicating its functionality. I'm quite interested in this. Maybe I'm thinking of something too complex, but I do think there are some "oh it'll need to do that too" pitfalls that are best considered up front. The big thing about MQ is that it participates as a resource manager in two-phase commits (and optionally a transaction manager as well). That means that you get atomic processing steps: application takes message off a queue, processes it, commits its changes to the database, replies to message. The queue manager then does a second-phase commit for all of those steps, and that's when the reply really goes out. If the application fails, none of this will have happened so you get ACID over the complete cycle. That's something we should have free software for. Perhaps the time is right for something new. A lot of the complexity inside MQ comes from data representation issues like encodings and fixed-length strings, as I recall, and things have changed since MQ was designed. I agree it could be useful (and probably not hard either) to have a transactional messaging system inside the database. It saves you from having to do two-phase commits. But it does tie everything to postgres to some extent, and you lose the interesting featuresatomicity and assured, single deliveryas soon as anything in the chain does anything persistent that does not participate in the postgres transaction. Perhaps what we really need is more mature components, with a unified control layer on top. That's how a lot of successful free software grows. See below. > On the other side, the "big names" these days are: > > a) The Java Messaging Service, which seems to implement *way* more > options than I'm even vaguely interested in having (notably, lots > that involve data stores or lack thereof that I do not care to use); Far as I know, JMS is an API, not a product. You'd still slot some messaging middleware underneath, such as MQ. That is why MQSeries was renamed: it fits into the WebSphere suite as the implementing engine underneath the JMS API. From what I understand MQ is one of the "best-of-breed" products that JMS was designed around. (Sun's term, bit hypey for my taste). In one way, Java is easy: the last thing you want to get into is yet another marshaling standard. There are plenty of "standards" to choose from already, each married to one particular communications mechanism: RPC, EDI, CORBA, D-Bus, XMLRPC, what have you. Even postgres has its own.I'd say the most successful mechanism is TCP itself,because it isolates itself from content representation so effectively. It's hard not to get into marshaling: someone has to do it, and it's often a drag to do it in the application, but the way things stand now *any* choice limits the usefulness of what you're building. That's something I'd like to see change. Personally I'd love to see marshaling or low-level data representation isolated into a mature component that speaks multiple programming languages on the one hand and multiple data representation formats on the other. Something the implementers of some of these messaging standards would want to use to compose their messages, isolating their format definitions into plugins. Something that would make application writers stop composing messages in finicky ad-hoc code that fails with unexpected locales or has trouble with different line breaks. If we had a component like that, combining it with existing transactional variants of TCP and [S]HTTP might even be enough to build a usable messaging system. I haven't looked at them enough to know. Of course we'd need implementations of those protocols; see http://ttcplinux.sourceforge.net/ and http://www.csn.ul.ie/~heathclf/fyp/ for example. Another box of important tools, and I have no idea where we stand with this one, is transaction management. We have 2-phase commit in postgres now. But do we have interoperability with existing transaction managers? Is there a decent free, portable, everything-agnostic transaction manager?With those, the sphere of reliability of a database-drivenmessaging package could extend much further. A free XA-capable filesystem would be great too, but I guess I'm daydreaming. > There tend to be varying semantics out there: > > - Some queues may represent "subscriptions" where a whole bunch of > listeners want to get all the messages; The two simplest models that offer something more than TCP/UDP are 1:n reliable publish-subscribe without persistence, and 1:1 request-reply with persistent storage. D-Bus does them both; IIRC MQ does 1:1 and has add-ons on top for publish-subscribe. I could imagine variations such as persistent publish-subscribe, where you can come back once in a while and see if your subscriptions caught anything since your last visit. But such things probably get more complex and less useful as you add more ideas. On top of that goes communication model: symmetric or asymmetric, synchronous or asynchronous. Do you end up with a "remote procedure call" model like RPC, D-Bus, CORBA? Or do you stick with a pure message/event view of communication? Personally I think it's good not to intrude into the application's event loop too much, but others seem to feel the central event loop should not be part of application code. > - Sometimes you have the semantics where: > - messages need to be delivered at least once > - messages need to be delivered no more than once > - messages need to be delivered exactly once IMHO, if you're not doing "exactly once," or something very close to it, you might as well stay with ad-hoc code. You can ensure single delivery by having the sender re-send when in doubt, and keeping track of duplications in the recipient. > Is there any existing work out there on this? Or should I maybe be > looking at prototyping something? I've looked around a bit (not much) and not found anything very generally useful. I think it's an exciting area that probably needs work, so prototyping might be a good idea. If nothing else, I hope I've given you some examples of what you don't want to get yourself into. :-) Jeroen
pgsql-hackers by date: