Portals and nested transactions - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Portals and nested transactions |
Date | |
Msg-id | 18506.1089752226@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: Portals and nested transactions
|
List | pgsql-hackers |
I've been thinking about what to do with cursors in subtransactions. The problem really includes both cursors (created with DECLARE CURSOR) and portals (created with the V3-protocol Bind message) since they are the same kind of animal internally, namely a Portal. In previous discussion I think everyone agreed that we would like the following properties: 1. A Portal created within a successful (committed) subtransaction remains open and usable by the parent transaction, as well as by subsequent child subtransactions. 2. If a subtransaction uses (fetches from) a pre-existing Portal, the Portal state change persists after subxact commit. What was not totally settled was what to do on subtransaction abort: Q1: Should Portals successfully created within the failed subxact be closed? Or should they remain open? Q2: If the subxact changed the state of a pre-existing Portal, should that state change roll back? In particular, can a Close Portal operation roll back? Taking a "transactional" view means answering "yes" to both questions (so that all portal state returns to what it was at subxact entry). But there was also support for a "nontransactional" view in which both questions are answered "no". The discussion sort of trailed off there because we had no ideas how to implement either. I will now sketch some implementation ideas about how to do the nontransactional way. We could support the transactional behavior as well, but not very efficiently (at least not in the first cut). An important limitation that I think we must make is that any error occurring while a specific Portal is executing "kills" that Portal; you cannot do anything further with it except close it, even if the Portal would otherwise have survived the subtransaction abort caused by the error. The reason for this is that we can't be sure we have consistent internal state for the Portal when an error occurred at a random point. (Example: a btree index scan could have released lock on one buffer and gotten an error while trying to read the next page of the index; it's not certain that the scan data structures accurately reflect this intermediate state.) Later on we might be able to relax this restriction, but it will take a lot of care to decide which errors are "safe". So for the moment, an error during a FETCH (or protocol-level Execute) leaves that Portal in a state where any subsequent fetch or execute draws "ERROR: portal execution cannot be continued". How to do it non-transactionally -------------------------------- The key insight I had while thinking about this is that subtransactions are the wrong units for managing ownership of resources used by queries (buffers, locks, etc). When portals can outlive subtransactions, those resources really need to be thought of as belonging to the portals not the subtransactions. However I think we *also* need to allow subtransaction to own resources --- at least locks. We usually want to hold table locks until main transaction end, and it would be bad to have to keep Portals around just to remember some locks. It would be better to reassign ownership of the locks to the current transaction when a Portal is closed. What I think we ought to do to support this is to invent the concept of "ResourceOwner" objects, which will be very much like MemoryContexts except that they represent held buffer pins, table locks, and anything else that we decide needs to be managed in this fashion (rtree index scans are one example). In particular we'll let ResourceOwners have child ResourceOwner objects so that there can be forests of the things, just as with MemoryContexts. There would be a CurrentResourceOwner global variable analogous to CurrentMemoryContext, which would for instance tell PinBuffer which ResourceOwner to affix ownership of the pin to. (I am half tempted to unify ResourceOwners and MemoryContexts completely, but that's probably overkill, since we have many short-lived MemoryContexts that would never be appropriate owners of query-level resources. In particular I think CurrentResourceOwner would usually be different from CurrentMemoryContext.) Depending on the resource in question, we could let ResourceOwners point to owned objects (for instance, I'm thinking of storing an array of Buffer numbers in a ResourceOwner to represent buffer pins) or vice versa (for instance, the best way to keep track of rtree indexscan ownership is probably to store a pointer to the ResourceOwner object in the rtree indexscan struct). This infrastructure shouldn't be much work to create, since we have the MemoryContext stuff available to serve as a model. Once we have it, we'll create a ResourceOwner for each transaction or subtransaction as well as one for each Portal. (We need one for a transaction because, for example, query parsing requires buffer and lock access, and that happens before we create a Portal to execute the query.) The reason this solves our problems is that while executing a portal's query, CurrentResourceOwner will point to the portal's ResourceOwner not the current subtransaction's. Therefore any buffers, locks, etc acquired or released by the query are effectively owned by the portal and not by the subtransaction. Subtransaction abort would release only resources associated with the subtransaction itself, not those associated with the portals it has happened to touch. A nice property of this solution is that we can get rid of much of the subtransaction entry/exit overhead that exists in current CVS tip. There's no particular reason for the buffer manager to save and restore buffer pin counts, for example. One of the reasons why the pre-existing code needed to check and zero buffer pin counts at transaction abort is that it cannot assume that all pins held on behalf of a query were properly released by query abort; that would have required making global assumptions about all code everywhere being careful to record held pins in places that query abort cleanup would find out about them. In the ResourceOwner paradigm, the equivalent correctness guarantee needs only local correctness in a few routines: for instance, PinBuffer has to be sure it cannot error out between acquiring a buffer pin and recording the pin in CurrentResourceOwner. (Thus, for instance, it should make sure there is room in the ResourceOwner's buffer-number array *before* it grabs the pin.) This should give us much the same level of confidence that we have now for memory management: there are no permanent memory leaks when you use palloc. How to do it transactionally ---------------------------- With the above design, we can handle nontransactional Portals easily: we just don't touch the state of (non-failed) Portals when we exit a subtransaction. It's easy to see how to handle Portal creation and deletion transactionally; it's pretty much the same algorithm we use already in other places such as OnCommit. Just stamp each Portal with the creating or would-be-deleting subxact's XID. The hard part is rolling back a pre-existing Portal to its prior state at subxact abort. It seems completely infeasible to do this at a low level --- we'd never find all the state involved, and if we could get it all there would be too much to save/restore efficiently. What I think we could do, though, is record the Portal's high-level state as the number of rows fetched from it. On abort, rewind the Portal and then fetch that number of rows again (this is the same method used by MOVE ABSOLUTE). We could optimize things a little bit by not doing this repositioning until and unless the Portal is actually used again. Still, it wouldn't be cheap... Of course this only handles SELECT-query portals, not portals that contain data-modification commands. But the latter cannot be suspended partway through anyhow, so there is no scenario where we need to recover to a partly-executed state. (Recall what I said before about not allowing continuation of a portal that itself got an error.) Comments? regards, tom lane
pgsql-hackers by date: