Thread: relcache refcount
Hackers, I'm stuck trying to figure out how to decrease reference counting for relcache entries at subtransaction abort. Initially I thought I could just drop them all to zero, because a subtransaction boundary should be enough warranty that the entries are no longer needed. However I now think this is bogus, because maybe a function could open a new transaction and abort it; and a surrounding query would need the previous relcache entry. So this cannot possibly work (if I'm wrong I'll be very happy because this is the easiest way). Keeping a list of all entries the current subtrans holds and its local refcount sounds ridiculous, doesn't it? We would need one hash per subtransaction; this is very bad. Any ideas out there? Incidentally, I assume that LWLocks are not going to be needed across subtransaction boundaries -- I release them all on abort, just as it's done on main transaction abort. Same for catcache entries. Does anyone think this is incorrect? -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "No hay cielo posible sin hundir nuestras raíces en la profundidad de la tierra" (Malucha Pinto)
Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > I'm stuck trying to figure out how to decrease reference counting for > relcache entries at subtransaction abort. > Initially I thought I could just drop them all to zero, Nope, you can't. An active query plan will surely have open relations. > Incidentally, I assume that LWLocks are not going to be needed across > subtransaction boundaries -- I release them all on abort, just as it's > done on main transaction abort. Same for catcache entries. Does anyone > think this is incorrect? Sounds like a very unsafe assumption to me. The reason we can get away with force-to-zero logic for these things now is that we know we are reverting to an idle condition. The general solution would require reverting to the state prevailing at subtrans entry. If you want to avoid implementing the general solution for any particular backend module, you'd better be able to prove that it will be in an idle state at every subtrans entry. It's barely possible that that's true for LWLocks but I've got real serious doubts about catcache. As an example: mightn't the call handler for a procedural language hold onto a reference to the proc's pg_proc row throughout execution? Even if it chances not to do that today, somebody could easily want to do it tomorrow, so I think an assumption that it's not needed would be too fragile. BTW, what are your plans for state saving/reversion for the lock manager and buffer manager? The lock state, in particular, makes these other problems look trivial by comparison. Glad to see you are starting to realize why nested transactions haven't been done already ;-) regards, tom lane
> BTW, what are your plans for state saving/reversion for the lock manager > and buffer manager? The lock state, in particular, makes these other > problems look trivial by comparison. Why can't we keep all locks until main tx end ? Locks are not self conflicting are they ? So the only reason to free them would be to improve concurrency, and imho we don't need that. I guess I am just not seeing this correctly. (I am assuming that a deadlock will still break the whole tx) Andreas
"Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes: > Why can't we keep all locks until main tx end ? For committed subtransactions we have to do that, yes, but for aborted subtransactions we must release. Otherwise you can't implement a retry loop around a potentially-deadlocking operation. > (I am assuming that a deadlock will still break the whole tx) Wrong. We might as well not bother with the entire project. regards, tom lane
> > Why can't we keep all locks until main tx end ? > > For committed subtransactions we have to do that, yes, but for aborted > subtransactions we must release. Otherwise you can't implement a retry > loop around a potentially-deadlocking operation. Ok, that would certainly be good to have, but it is imho not a "must have". > > (I am assuming that a deadlock will still break the whole tx) > > Wrong. We might as well not bother with the entire project. There are plenty of examples that do not involve deadlocks. The most prominent was plobably "insert -> duplicate key -> update instead" Also the new NOLOCK statements come to mind, ... Andreas
On Thu, May 13, 2004 at 09:43:42AM -0400, Tom Lane wrote: > Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > > I'm stuck trying to figure out how to decrease reference counting for > > relcache entries at subtransaction abort. > > > Initially I thought I could just drop them all to zero, > > Nope, you can't. An active query plan will surely have open relations. Ok, I created a function to copy a hash table (from dynahash). So now at subtransaction start the RelationIdCache and RelationSysNameCache hash tables are copied, and if the subtransaction aborts the previous hash tables are restored. It seems to do the right thing (i.e. at main transaction commit the refcounts seem to be OK, which was not true in my previous tests). I would like a comment on this solution. Regarding the lock mechanism, I simply added some code to LockReleaseAll so it gets the array of committed child Xids; on subtransaction abort, the whole lock struct is scanned just like it's done on main transaction abort; only those locks affiliated with one of the given Xids are released. This is naive, so if it's incorrect please comment. It seems to work OK on the simple tests I did. Incidentally, I noted that the relcache refcounts were wrong while testing deadlock handling; there were unreleased relcache messages from the aborted subtransaction. Those are now gone, as expected. I'm still missing a solution for Catcache and the buffer manager, but I'd like review on what's currently done. I'll post the current patch to patches shortly. PS: Either the list server is getting very slow or it has started to lose mail. I asked yesterday whether it was OK to copy the hash but apparently the mail didn't make it to the list. Is there something happening? -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "No hay cielo posible sin hundir nuestras raíces en la profundidad de la tierra" (Malucha Pinto)
Alvaro Herrera wrote: > PS: Either the list server is getting very slow or it has started to > lose mail. I asked yesterday whether it was OK to copy the hash but > apparently the mail didn't make it to the list. Is there something > happening? Not sure. I can confirm I never saw that email. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > Ok, I created a function to copy a hash table (from dynahash). So now > at subtransaction start the RelationIdCache and RelationSysNameCache > hash tables are copied, and if the subtransaction aborts the previous > hash tables are restored. I don't think that will do; what about updates absorbed during the subtrans from other backends' changes? In any case, copying the whole cache state during every subtrans start is not the kind of overhead I wish to pay ... > Regarding the lock mechanism, I simply added some code to LockReleaseAll > so it gets the array of committed child Xids; on subtransaction abort, > the whole lock struct is scanned just like it's done on main transaction > abort; only those locks affiliated with one of the given Xids are > released. This is naive, so if it's incorrect please comment. Not sure; it's been a long day and I'm tired ... will think about it tomorrow ... > PS: Either the list server is getting very slow or it has started to > lose mail. I asked yesterday whether it was OK to copy the hash but > apparently the mail didn't make it to the list. Is there something > happening? I haven't seen that one. Check your subscription settings to see whether you'd be notified if the list bot delayed your post for some reason. regards, tom lane
On Fri, May 14, 2004 at 11:21:42PM -0400, Tom Lane wrote: > Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > > Ok, I created a function to copy a hash table (from dynahash). So now > > at subtransaction start the RelationIdCache and RelationSysNameCache > > hash tables are copied, and if the subtransaction aborts the previous > > hash tables are restored. > > I don't think that will do; what about updates absorbed during the > subtrans from other backends' changes? In any case, copying the whole > cache state during every subtrans start is not the kind of overhead > I wish to pay ... Oh, I see that now. (And I agree with the overhead argument anyway.) I think I can add a "refcount stack" to RelationData, and update it each time a subtransaction is entered; on abort, previous counts are restored for all relations. This is certainly much cheaper than saving the whole hash, and it is more correct regarding invalidations. I'll leave the hash_copy() function though ... maybe someone can think on a use for it ... I just did something like the refcount stack for the CatCache code. I'm not sure how to verify that it actually works ... will play with gdb. > > Regarding the lock mechanism, I simply added some code to LockReleaseAll > > so it gets the array of committed child Xids; on subtransaction abort, > > the whole lock struct is scanned just like it's done on main transaction > > abort; only those locks affiliated with one of the given Xids are > > released. This is naive, so if it's incorrect please comment. > > Not sure; it's been a long day and I'm tired ... will think about it > tomorrow ... The kind of issue I am concerned with is what happens when a subtransaction upgrades a previously acquired lock. Does this release the lesser lock? I don't think so. -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "The first of April is the day we remember what we are the other 364 days of the year" (Mark Twain)
Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > Regarding the lock mechanism, I simply added some code to LockReleaseAll > so it gets the array of committed child Xids; on subtransaction abort, > the whole lock struct is scanned just like it's done on main transaction > abort; only those locks affiliated with one of the given Xids are > released. This is naive, so if it's incorrect please comment. Another and perhaps simpler way would be to leave the release code alone, but on subtransaction commit scan through the lock structs and re-mark locks held by the subtrans as being held by the parent. I think these are isomorphic functionally. The second way feels like it would be faster (no inner loop over child XIDs). On the other hand, if your current code does not require scanning the lock structures at all on subtrans commit, it's probably not a win to add such a scan. The lock algorithms must be able to tell when two lock requests are coming from the same backend. At present I think this relies on comparing XIDs, which is not going to work if you label subtrans locks with subtrans XIDs. How are you thinking about handling that? regards, tom lane
On Sat, May 15, 2004 at 02:08:39PM -0400, Tom Lane wrote: > Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > > Regarding the lock mechanism, I simply added some code to LockReleaseAll > > so it gets the array of committed child Xids; on subtransaction abort, > > the whole lock struct is scanned just like it's done on main transaction > > abort; only those locks affiliated with one of the given Xids are > > released. This is naive, so if it's incorrect please comment. > > Another and perhaps simpler way would be to leave the release code > alone, but on subtransaction commit scan through the lock structs > and re-mark locks held by the subtrans as being held by the parent. > I think these are isomorphic functionally. The second way feels like > it would be faster (no inner loop over child XIDs). The problem is that the Xid is part of the locktag (which is the hash key) as far as I can tell, so relabeling means I have to delete the lock from the hashtable and then insert it again with the new tag. I don't think this is a good idea. (I found this out the hard way: I was getting "proclock hashtable corrupted" when finishing a subtransaction after relabeling the locks). > On the other hand, if your current code does not require scanning the > lock structures at all on subtrans commit, it's probably not a win to > add such a scan. Nope, it doesn't. > The lock algorithms must be able to tell when two lock requests are > coming from the same backend. At present I think this relies on > comparing XIDs, which is not going to work if you label subtrans locks > with subtrans XIDs. How are you thinking about handling that? Nope, it doesn't compare Xids. That code actually looks up locks through the PGPROC struct (procHolders), so the "current Xid" does not affect it. Deadlock detection works without changing the code at all. It took me quite a while to figure this out, as this is pretty hairy code ... the hairiest I've seen in Postgres. I was surprised to learn the original Berkeley code came without deadlock detection. -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "I call it GNU/Linux. Except the GNU/ is silent." (Ben Reiter)
On Thu, May 13, 2004 at 09:43:42AM -0400, Tom Lane wrote: > BTW, what are your plans for state saving/reversion for the lock manager > and buffer manager? Ok, I have skimmed through the buffer manager code. At first I thought I'd have to set something up in shared memory for this, but then I noticed that every backend keeps a local reference count, and modifies the shared counter only when the local one raises from zero, or drops to zero. Also, the number of buffers does not change while the server is running. So I see two ways of handling this: 1. Keep an array of local refcounts for all buffers, for each subtrans. At subtrans start, a new array is allocated and filled with current local refcounts. At subtrans abort, we check if any count would go to zero while restoring; if so, decrement the shared counter. At subtrans commit, drop the saved array. The problem with this approach is that we allocate a large array which likely will be almost full of zeros. 2. Keep a more elaborate struct, where each buffer get its local count saved only if its nonzero. Thus we don't have to allocate a large amount of memory. Comments? Opinions? Which one do you think is better? Any completely different idea? -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "Coge la flor que hoy nace alegre, ufana. ¿Quién sabe si nacera otra mañana?"