Thread: relcache refcount

relcache refcount

From

Alvaro Herrera

Date:

13 May 2004, 00:53:16

Hackers,

I'm stuck trying to figure out how to decrease reference counting for
relcache entries at subtransaction abort.

Initially I thought I could just drop them all to zero, because a
subtransaction boundary should be enough warranty that the entries are
no longer needed.  However I now think this is bogus, because maybe a
function could open a new transaction and abort it; and a surrounding
query would need the previous relcache entry.  So this cannot possibly
work (if I'm wrong I'll be very happy because this is the easiest way).

Keeping a list of all entries the current subtrans holds and its local
refcount sounds ridiculous, doesn't it?  We would need one hash per
subtransaction; this is very bad.

Any ideas out there?


Incidentally, I assume that LWLocks are not going to be needed across
subtransaction boundaries -- I release them all on abort, just as it's
done on main transaction abort.  Same for catcache entries.  Does anyone
think this is incorrect?

-- 
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"No hay cielo posible sin hundir nuestras raíces
en la profundidad de la tierra"                        (Malucha Pinto)

Re: relcache refcount

From

Tom Lane

Date:

13 May 2004, 10:44:09

Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> I'm stuck trying to figure out how to decrease reference counting for
> relcache entries at subtransaction abort.

> Initially I thought I could just drop them all to zero,

Nope, you can't.  An active query plan will surely have open relations.

> Incidentally, I assume that LWLocks are not going to be needed across
> subtransaction boundaries -- I release them all on abort, just as it's
> done on main transaction abort.  Same for catcache entries.  Does anyone
> think this is incorrect?

Sounds like a very unsafe assumption to me.  The reason we can get away
with force-to-zero logic for these things now is that we know we are
reverting to an idle condition.  The general solution would require
reverting to the state prevailing at subtrans entry.  If you want to
avoid implementing the general solution for any particular backend
module, you'd better be able to prove that it will be in an idle state
at every subtrans entry.  It's barely possible that that's true for
LWLocks but I've got real serious doubts about catcache.  As an example:
mightn't the call handler for a procedural language hold onto a
reference to the proc's pg_proc row throughout execution?  Even if it
chances not to do that today, somebody could easily want to do it
tomorrow, so I think an assumption that it's not needed would be too
fragile.

BTW, what are your plans for state saving/reversion for the lock manager
and buffer manager?  The lock state, in particular, makes these other
problems look trivial by comparison.

Glad to see you are starting to realize why nested transactions haven't
been done already ;-)
        regards, tom lane

Re: relcache refcount

From

"Zeugswetter Andreas SB SD"

Date:

14 May 2004, 08:00:04

> BTW, what are your plans for state saving/reversion for the lock manager
> and buffer manager?  The lock state, in particular, makes these other
> problems look trivial by comparison.

Why can't we keep all locks until main tx end ? Locks are not self conflicting
are they ? So the only reason to free them would be to improve concurrency,
and imho we don't need that. I guess I am just not seeing this correctly.
(I am assuming that a deadlock will still break the whole tx)

Andreas

Re: relcache refcount

From

Tom Lane

Date:

14 May 2004, 09:03:22

"Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:
> Why can't we keep all locks until main tx end ?

For committed subtransactions we have to do that, yes, but for aborted
subtransactions we must release.  Otherwise you can't implement a retry
loop around a potentially-deadlocking operation.

> (I am assuming that a deadlock will still break the whole tx)

Wrong.  We might as well not bother with the entire project.
        regards, tom lane

Re: relcache refcount

From

"Zeugswetter Andreas SB SD"

Date:

14 May 2004, 10:35:19

> > Why can't we keep all locks until main tx end ?
>
> For committed subtransactions we have to do that, yes, but for aborted
> subtransactions we must release.  Otherwise you can't implement a retry
> loop around a potentially-deadlocking operation.

Ok, that would certainly be good to have, but it is imho not a "must have".

> > (I am assuming that a deadlock will still break the whole tx)
>
> Wrong.  We might as well not bother with the entire project.

There are plenty of examples that do not involve deadlocks.
The most prominent was plobably "insert -> duplicate key -> update instead"
Also the new NOLOCK statements come to mind, ...

Andreas

Re: relcache refcount

From

Alvaro Herrera

Date:

14 May 2004, 16:21:06

On Thu, May 13, 2004 at 09:43:42AM -0400, Tom Lane wrote:
> Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> > I'm stuck trying to figure out how to decrease reference counting for
> > relcache entries at subtransaction abort.
> 
> > Initially I thought I could just drop them all to zero,
> 
> Nope, you can't.  An active query plan will surely have open relations.

Ok, I created a function to copy a hash table (from dynahash).  So now
at subtransaction start the RelationIdCache and RelationSysNameCache
hash tables are copied, and if the subtransaction aborts the previous
hash tables are restored.  It seems to do the right thing (i.e. at main
transaction commit the refcounts seem to be OK, which was not true in my
previous tests).  I would like a comment on this solution.

Regarding the lock mechanism, I simply added some code to LockReleaseAll
so it gets the array of committed child Xids; on subtransaction abort,
the whole lock struct is scanned just like it's done on main transaction
abort; only those locks affiliated with one of the given Xids are
released.  This is naive, so if it's incorrect please comment.  It seems
to work OK on the simple tests I did.

Incidentally, I noted that the relcache refcounts were wrong while
testing deadlock handling; there were unreleased relcache messages from
the aborted subtransaction.  Those are now gone, as expected.

I'm still missing a solution for Catcache and the buffer manager, but
I'd like review on what's currently done.  I'll post the current patch
to patches shortly.

PS: Either the list server is getting very slow or it has started to
lose mail.  I asked yesterday whether it was OK to copy the hash but
apparently the mail didn't make it to the list.  Is there something
happening?

-- 
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"No hay cielo posible sin hundir nuestras raíces
en la profundidad de la tierra"                        (Malucha Pinto)

Re: relcache refcount

From

Bruce Momjian

Date:

14 May 2004, 16:41:24

Alvaro Herrera wrote:
> PS: Either the list server is getting very slow or it has started to
> lose mail.  I asked yesterday whether it was OK to copy the hash but
> apparently the mail didn't make it to the list.  Is there something
> happening?

Not sure.  I can confirm I never saw that email.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: relcache refcount

From

Tom Lane

Date:

15 May 2004, 00:22:13

Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> Ok, I created a function to copy a hash table (from dynahash).  So now
> at subtransaction start the RelationIdCache and RelationSysNameCache
> hash tables are copied, and if the subtransaction aborts the previous
> hash tables are restored.

I don't think that will do; what about updates absorbed during the
subtrans from other backends' changes?  In any case, copying the whole
cache state during every subtrans start is not the kind of overhead
I wish to pay ...

> Regarding the lock mechanism, I simply added some code to LockReleaseAll
> so it gets the array of committed child Xids; on subtransaction abort,
> the whole lock struct is scanned just like it's done on main transaction
> abort; only those locks affiliated with one of the given Xids are
> released.  This is naive, so if it's incorrect please comment.

Not sure; it's been a long day and I'm tired ... will think about it
tomorrow ...

> PS: Either the list server is getting very slow or it has started to
> lose mail.  I asked yesterday whether it was OK to copy the hash but
> apparently the mail didn't make it to the list.  Is there something
> happening?

I haven't seen that one.  Check your subscription settings to see
whether you'd be notified if the list bot delayed your post for some
reason.
        regards, tom lane

Re: relcache refcount

From

Alvaro Herrera

Date:

15 May 2004, 00:46:02

On Fri, May 14, 2004 at 11:21:42PM -0400, Tom Lane wrote:
> Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> > Ok, I created a function to copy a hash table (from dynahash).  So now
> > at subtransaction start the RelationIdCache and RelationSysNameCache
> > hash tables are copied, and if the subtransaction aborts the previous
> > hash tables are restored.
> 
> I don't think that will do; what about updates absorbed during the
> subtrans from other backends' changes?  In any case, copying the whole
> cache state during every subtrans start is not the kind of overhead
> I wish to pay ...

Oh, I see that now.  (And I agree with the overhead argument anyway.)  I
think I can add a "refcount stack" to RelationData, and update it each
time a subtransaction is entered; on abort, previous counts are restored
for all relations.  This is certainly much cheaper than saving the whole
hash, and it is more correct regarding invalidations.

I'll leave the hash_copy() function though ... maybe someone can think
on a use for it ...

I just did something like the refcount stack for the CatCache code.  I'm
not sure how to verify that it actually works ... will play with gdb.

> > Regarding the lock mechanism, I simply added some code to LockReleaseAll
> > so it gets the array of committed child Xids; on subtransaction abort,
> > the whole lock struct is scanned just like it's done on main transaction
> > abort; only those locks affiliated with one of the given Xids are
> > released.  This is naive, so if it's incorrect please comment.
> 
> Not sure; it's been a long day and I'm tired ... will think about it
> tomorrow ...

The kind of issue I am concerned with is what happens when a
subtransaction upgrades a previously acquired lock.  Does this release
the lesser lock?  I don't think so.

-- 
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"The first of April is the day we remember what we are
the other 364 days of the year"  (Mark Twain)

Re: relcache refcount

From

Tom Lane

Date:

16 May 2004, 14:11:41

Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> Regarding the lock mechanism, I simply added some code to LockReleaseAll
> so it gets the array of committed child Xids; on subtransaction abort,
> the whole lock struct is scanned just like it's done on main transaction
> abort; only those locks affiliated with one of the given Xids are
> released.  This is naive, so if it's incorrect please comment.

Another and perhaps simpler way would be to leave the release code
alone, but on subtransaction commit scan through the lock structs
and re-mark locks held by the subtrans as being held by the parent.
I think these are isomorphic functionally.  The second way feels like
it would be faster (no inner loop over child XIDs).  On the other hand,
if your current code does not require scanning the lock structures at
all on subtrans commit, it's probably not a win to add such a scan.

The lock algorithms must be able to tell when two lock requests are
coming from the same backend.  At present I think this relies on
comparing XIDs, which is not going to work if you label subtrans locks
with subtrans XIDs.  How are you thinking about handling that?
        regards, tom lane

Re: relcache refcount

From

Alvaro Herrera

Date:

23 May 2004, 18:23:48

On Sat, May 15, 2004 at 02:08:39PM -0400, Tom Lane wrote:
> Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> > Regarding the lock mechanism, I simply added some code to LockReleaseAll
> > so it gets the array of committed child Xids; on subtransaction abort,
> > the whole lock struct is scanned just like it's done on main transaction
> > abort; only those locks affiliated with one of the given Xids are
> > released.  This is naive, so if it's incorrect please comment.
> 
> Another and perhaps simpler way would be to leave the release code
> alone, but on subtransaction commit scan through the lock structs
> and re-mark locks held by the subtrans as being held by the parent.
> I think these are isomorphic functionally.  The second way feels like
> it would be faster (no inner loop over child XIDs).

The problem is that the Xid is part of the locktag (which is the hash
key) as far as I can tell, so relabeling means I have to delete the lock
from the hashtable and then insert it again with the new tag.  I don't
think this is a good idea.

(I found this out the hard way: I was getting "proclock hashtable
corrupted" when finishing a subtransaction after relabeling the locks).

> On the other hand, if your current code does not require scanning the
> lock structures at all on subtrans commit, it's probably not a win to
> add such a scan.

Nope, it doesn't.

> The lock algorithms must be able to tell when two lock requests are
> coming from the same backend.  At present I think this relies on
> comparing XIDs, which is not going to work if you label subtrans locks
> with subtrans XIDs.  How are you thinking about handling that?

Nope, it doesn't compare Xids.  That code actually looks up locks
through the PGPROC struct (procHolders), so the "current Xid" does not
affect it.  Deadlock detection works without changing the code at all.

It took me quite a while to figure this out, as this is pretty hairy
code ... the hairiest I've seen in Postgres.  I was surprised to learn
the original Berkeley code came without deadlock detection.

-- 
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"I call it GNU/Linux. Except the GNU/ is silent." (Ben Reiter)

Buffer manager (was Re: relcache refcount)

From

Alvaro Herrera

Date:

24 May 2004, 01:16:04

On Thu, May 13, 2004 at 09:43:42AM -0400, Tom Lane wrote:

> BTW, what are your plans for state saving/reversion for the lock manager
> and buffer manager?

Ok, I have skimmed through the buffer manager code.  At first I thought
I'd have to set something up in shared memory for this, but then I
noticed that every backend keeps a local reference count, and modifies
the shared counter only when the local one raises from zero, or drops to
zero.

Also, the number of buffers does not change while the server is running.

So I see two ways of handling this:

1. Keep an array of local refcounts for all buffers, for each subtrans.
At subtrans start, a new array is allocated and filled with current
local refcounts.  At subtrans abort, we check if any count would go to
zero while restoring; if so, decrement the shared counter.  At subtrans
commit, drop the saved array.

The problem with this approach is that we allocate a large array which
likely will be almost full of zeros.

2. Keep a more elaborate struct, where each buffer get its local count
saved only if its nonzero.  Thus we don't have to allocate a large
amount of memory.

Comments?  Opinions?  Which one do you think is better?  Any completely
different idea?

-- 
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Coge la flor que hoy nace alegre, ufana. ¿Quién sabe si nacera otra mañana?"