shared memory message queues - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | shared memory message queues |
Date | |
Msg-id | CA+TgmobUe28JR3zRUDH7s0jkCcdxsw6dP4sLw57x9NnMf01wgg@mail.gmail.com Whole thread Raw |
Responses |
Re: shared memory message queues
Re: shared memory message queues Re: shared memory message queues Re: shared memory message queues |
List | pgsql-hackers |
Right now, it's pretty hard to write code that does anything useful with dynamic shared memory. Sure, you can allocate a dynamic shared memory segment, and that's nice, but you won't get any help at all figuring out what to store in it, or how to use it to communicate effectively, which is not so nice. And some of the services we offer around the main shared memory segment are conspicuously missing for dynamic shared memory. The attached patches attempt to rectify some of these problems. If you're not the patient type who wants to read the whole email, patch #3 is the cool part. Patch #1, on-dsm-detach-v1.patch, adds the concept of on_dsm_detach hooks. These are basically like on_shmem_exit hooks, except that detaching from a dsm can happen at any time, not just at backend exit. But they're needed for the same reasons: when we detach from the main shared memory segment, we need to make sure that we've released all relevant locks, returned our PGPROC to the pool, etc. Dynamic shared memory segments require the same sorts of cleanup when they contain similarly complex data structures. The part of this patch which I suppose will elicit some controversy is that I've had to rearrange on_shmem_exit a bit. It turns out that during shmem_exit, we do "user-level" cleanup, like aborting the transaction, first. We expect that will probably release all of our shared-memory resources. Then, just to make doubly sure, we do "low-level cleanup", where individual modules return session-lifetime resources and make doubly sure that no lwlocks, etc. have been leaked. on_dsm_exit callbacks properly happen in the middle, after we've tried to abort the transaction but before the main shared memory segment is finally shut down. I'm not sure that the solution I've adopted here is optimal; see within for details. Patch #2, shm-toc-v1.patch, provides a facility for sizing a dynamic shared memory segment before creation, and for dividing it up into chunks after it's been created. It therefore serves a function quite similar to RequestAddinShmemSpace, except of course that there is only one main shared memory segment created at postmaster startup time, whereas new dynamic shared memory segments can come into existence on the fly; and it serves even more conspicuously the function of ShmemIndex, which enables backends to locate particular data structures within the shared memory segment. It is however quite a bit simpler than the ShmemIndex mechanism: we don't need the same level of extensibility here that we do for the main shared memory segment, because a new extension need not piggyback on an existing dynamic shared memory segment, but can create a whole segment of its own. Patch #3, shm-mq-v1.patch, is the heart of this series. It creates an infrastructure for sending and receiving messages of arbitrary length using ring buffers stored in shared memory (presumably dynamic shared memory, but hypothetically the main shared memory segment could be used). Queues are single-reader and single-writer; they use process latches to implement waiting for the queue to fill (in the case of the reader) or drain (in the case of the writer). A non-blocking mode is also available for situations where other options might lead to deadlock. Even without this patch, backends can write messages to a dynamic shared memory segment and wait for some other backend to read them, but unless you know exactly how much data you want to send before you create the shared memory segment, and unless you don't mind storing all of it for the lifetime of the segment, you'll quickly run into non-trivial problems around memory reuse and synchronization. So this is an effort to create a higher-level infrastructure where one process can simply declare that it wishes to a send series of messages to a particular queue and another process can declare that it wishes to read them out of that queue, and so it happens. As far as parallelism is concerned, I anticipate that this code will be useful for at least two purposes: (1) propagating errors that occur inside a worker process back to the user backend that initiated the parallel operation; and (2) streaming tuples from a worker performing one part of the query (a scan or join, say) back to the user backend or another worker performing a different part of the same query. I suspect that this code will find applications outside parallelism as well. Patch #4, test-shm-mq-v1.patch, is a demonstration of how to use the various background worker and dynamic shared memory facilities introduced over the course of the 9.4 release cycle, and the facilities introduced by patches #1-#3 of this series, to actually do something interesting. Specifically, it sets up a ring of processes connected by shared message queues and relays a user-specified message around the ring repeatedly, then checks that it has the same message at the end. This is obviously just a demonstration, but I find it pretty cool, because the code here demonstrates that, with all of these facilities in place, setting up a bunch of workers and having them talk to each other can be done using what is really a pretty modest amount of code. Importantly, this patch shows how to make the start-up and shut-down sequences reliable, so that you don't end up with the user backend hanging forever waiting for a worker that has already died or will never start, or a worker backend waiting for a user backend that has already aborted. Review of this logic is particularly appreciated, as it's proven to be pretty complex: I think the solutions I've worked out here are generally good, but there may still be holes to plug. My hope is that people will take this test code and use it as a basis for real applications. Including this patch in our distribution will also serve as a useful regression test of dynamic background workers and dynamic shared memory, which has so far been lacking. Particular thanks are due to Noah Misch for serving as my constant sounding board during the development of this patch series. Thanks, -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
pgsql-hackers by date: