Thread: SearchSysCacheTuple(Copy)

SearchSysCacheTuple(Copy)

From

Hiroshi Inoue

Date:

14 November 2000, 00:50:21

Hi

I fixed a few bugs caused by SearchSysCacheTuple()
recently. There seems to remain more similar bugs.
They are ill-natured and so we seem to have to 
avoid such bugs by some means. The simplest way is
to call SearchSysCacheTupleCopy() instead of Search
SysCacheTuple(). I've thought another (debugging)
mechanism but it would be much harder.

Comments ?

Regargs.
Hiroshi Inoue

Re: SearchSysCacheTuple(Copy)

From

Tom Lane

Date:

14 November 2000, 01:00:40

Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> I fixed a few bugs caused by SearchSysCacheTuple()
> recently. There seems to remain more similar bugs.

Yes, I'm sure there are a lot :-(.  We have talked about solving this
by using some form of reference-counting in the cache, but I don't
believe anyone's put forward a complete proposal.  See for example
the pghackers thread "Another nasty cache problem" around 30-Jan-2000.
        regards, tom lane

Re: SearchSysCacheTuple(Copy)

From

Hiroshi Inoue

Date:

14 November 2000, 02:25:47

Tom Lane wrote:

> Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> > I fixed a few bugs caused by SearchSysCacheTuple()
> > recently. There seems to remain more similar bugs.
>
> Yes, I'm sure there are a lot :-(.  We have talked about solving this
> by using some form of reference-counting in the cache, but I don't
> believe anyone's put forward a complete proposal.  See for example
> the pghackers thread "Another nasty cache problem" around 30-Jan-2000.
>

Those bugs are pretty serious.
Sometimes they caused ERROR with strange messags.
Sometimes database-wide restart has been caused
by segmentation fault error. Database-wide restart
is pretty serious for any dbms.
And possibly they might have done the wrong thing
silently.
Isn't it practical to replace all susipicious Search
SysCacheTuple() by SearchSysCacheTupleCopy() ?

Regards.
Hiroshi Inoue

Re: SearchSysCacheTuple(Copy)

From

Tom Lane

Date:

14 November 2000, 02:37:08

Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> Isn't it practical to replace all susipicious Search
> SysCacheTuple() by SearchSysCacheTupleCopy() ?

That would replace a rare failure condition by a not-at-all-rare
memory leak.  I'm not sure there'd be a net gain in reliability :-(

A more serious objection to SearchSysCacheTupleCopy is that once the
tuple is copied out of the syscache, there isn't any mechanism to
detect whether it's still valid.  If an SI message arrives for a
recently-copied tuple, we have no way to know if we have a problem
or not.

I think that the right fix is to add reference counts to syscache
entries.  This has been on my todo list for awhile, but hasn't got
to the top yet...
        regards, tom lane

Re: SearchSysCacheTuple(Copy)

From

Hiroshi Inoue

Date:

14 November 2000, 03:23:17

Tom Lane wrote:

> Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> > Isn't it practical to replace all susipicious Search
> > SysCacheTuple() by SearchSysCacheTupleCopy() ?
>
> That would replace a rare failure condition by a not-at-all-rare
> memory leak.  I'm not sure there'd be a net gain in reliability :-(

> A more serious objection to SearchSysCacheTupleCopy is that once the
> tuple is copied out of the syscache, there isn't any mechanism to
> detect whether it's still valid.  If an SI message arrives for a
> recently-copied tuple, we have no way to know if we have a problem
> or not.
>

Is it more serious than doing the wrong thing silently ?
Is it more serious than forcing database restart ?
We couldn't handle SI messages immediately.
Cache machanism couldn't gurantee the validty of
tuples without some locking mechanism in the first
place.

Regards.
Hiroshi Inoue

Re: SearchSysCacheTuple(Copy)

From

Tom Lane

Date:

14 November 2000, 10:40:29

Hiroshi Inoue <Inoue@tpf.co.jp> writes:
>> A more serious objection to SearchSysCacheTupleCopy is that once the
>> tuple is copied out of the syscache, there isn't any mechanism to
>> detect whether it's still valid.  If an SI message arrives for a
>> recently-copied tuple, we have no way to know if we have a problem
>> or not.

> Is it more serious than doing the wrong thing silently ?

It'd still be doing the wrong thing silently, in my opinion.

This class of bugs has been there since the beginning of Postgres,
so I do not feel that we need to panic about it.  Let's take the
time to design and implement a proper solution, rather than expending
effort on a stopgap solution that'll have to be undone later.
        regards, tom lane

Re: SearchSysCacheTuple(Copy)

From

Tom Lane

Date:

14 November 2000, 12:47:05

I said:
> This class of bugs has been there since the beginning of Postgres,
> so I do not feel that we need to panic about it.  Let's take the
> time to design and implement a proper solution, rather than expending
> effort on a stopgap solution that'll have to be undone later.

I've reviewed the earlier threads on SearchSysCache and done some more
thinking.  Here is a concrete proposal for doing it right:

1. Add a reference-count field and an already-dead boolean to each  syscache entry.  (The already-dead flag indicates
thatwe received  an SI inval message for this tuple, but we couldn't delete it yet  because it has refcount > 0.  A
tuplewith this flag set will never  be returned by a new SearchSysCache call, and it will be deleted  from the cache as
soonas its refcount goes to zero.)
 

2. SearchSysCache() will increment the reference count of the tuple  it returns.

3. A new entry point ReleaseSysCache() will be provided to decrement  the reference count of a syscache entry.  Thus,
thestandard calling  sequence becomes
 
tuple = SearchSysCacheTuple(...);if (HeapTupleIsValid(tuple)){    ... use tuple ...    ReleaseSysCache(tuple);}

4. SearchSysCacheTupleCopy() goes away, since we may as well use  SearchSysCacheTuple() and ReleaseSysCache() instead
of SearchSysCacheTupleCopy() and heap_freetuple().
 

5. Since SearchSysCache() is called from exactly one place, namely  SearchSysCacheTuple(), I am inclined to rename
SearchSysCache() to SearchCatCache() and then give the name SearchSysCache() to  the more widely used routine.  So
SearchSysCache()and  ReleaseSysCache() would really be the widely used pair of routines.
 

6. When a shared-cache-inval message arrives, the syscache code  behaves as follows for each affected cache entry:  A.
Ifrefcount is zero, just delete the entry.  B. Otherwise, set the "already-dead" flag, so that future cache
searcheswill not return this tuple and it will be released     once its refcount reaches zero.
 

7. At end of transaction (whether normal or abort), scan the syscaches  to reset refcounts to zero and delete any
marked-deadentries.  We  should not consider it a software error to leave syscache entries  still locked at end of
transaction. (The parser, in particular,  would need a lot of work to avoid doing so, and I don't see much  value in
expendingsuch work.)
 

Note that this proposal does not include any attempt to detect whether
a cache inval message means that the tuple has actually been changed
(as opposed to just relocated, for example).  It seems too expensive
to go out and re-read such tuples, and I'm not sure that it's safe to
try to read new cache entries during cache inval anyway.  Besides,
most callers that are using a cache entry are happy to continue to use
the copy that was valid when they got it --- they don't care if a
subsequent transaction commit changes the tuple.

Callers that want to be certain they have a completely-up-to-date copy
should acquire a suitable lock on the associated system relation before
calling SearchSysCache().  In practice, the only callers that really need
this are places that are going to update or delete the tuple, and so they
need to acquire a write lock on the system relation anyway.  The coding
rule is then just "lock the relation before finding the tuple to update,
not after".  We have that rule in place already.

Comments?  I think I might have time to do this before Vadim finishes
with WAL ;-)
        regards, tom lane

Re: SearchSysCacheTuple(Copy)

From

Hiroshi Inoue

Date:

14 November 2000, 19:15:45

Tom Lane wrote:

> Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> >> A more serious objection to SearchSysCacheTupleCopy is that once the
> >> tuple is copied out of the syscache, there isn't any mechanism to
> >> detect whether it's still valid.  If an SI message arrives for a
> >> recently-copied tuple, we have no way to know if we have a problem
> >> or not.
>
> > Is it more serious than doing the wrong thing silently ?
>
> It'd still be doing the wrong thing silently, in my opinion.
>
> This class of bugs has been there since the beginning of Postgres,
> so I do not feel that we need to panic about it.

Not only SI overflow but also relation invalidatation necessarily
resets catalog cache.
I'm suspicious that the bugs caused by this fragility are rare.
For example I doubt that abnormal block numbers are due
to this bug. If so we could hardly reproduce the error and
it's only the waste of time to investigate the cause.
It would be really serious if the broken tuples are stored
into database.

Regards.
Hiroshi Inoue

Re: SearchSysCacheTuple(Copy)

From

Hiroshi Inoue

Date:

14 November 2000, 20:11:23

Tom Lane wrote:

> I said:
> > This class of bugs has been there since the beginning of Postgres,
> > so I do not feel that we need to panic about it.  Let's take the
> > time to design and implement a proper solution, rather than expending
> > effort on a stopgap solution that'll have to be undone later.
>
> I've reviewed the earlier threads on SearchSysCache and done some more
> thinking.  Here is a concrete proposal for doing it right:
>
> 1. Add a reference-count field and an already-dead boolean to each

[snip]

I agree with your proposal.

>
> Callers that want to be certain they have a completely-up-to-date copy
> should acquire a suitable lock on the associated system relation before
> calling SearchSysCache().

I'm suspcious if it's practical.
What is a suitable lock ?
The lock should conflict with RowExclusiveLock at least.
It must be one of the lock which is stronger than Row
ExclusiveLock or ShareLock(ShareLock is neither stronger
nor weaker than RowExclusiveLock).
RowExclusiveLock doesn't conflcit each other.
There seems to be few places where above locks for
system relations are appropriate.

Regards.
Hiroshi Inoue

Re: SearchSysCacheTuple(Copy)

From

Tom Lane

Date:

15 November 2000, 00:12:22

Hiroshi Inoue <Inoue@tpf.co.jp> writes:
>> Callers that want to be certain they have a completely-up-to-date copy
>> should acquire a suitable lock on the associated system relation before
>> calling SearchSysCache().

> I'm suspcious if it's practical.
> What is a suitable lock ?
> The lock should conflict with RowExclusiveLock at least.
> It must be one of the lock which is stronger than Row
> ExclusiveLock or ShareLock(ShareLock is neither stronger
> nor weaker than RowExclusiveLock).

I think RowExclusiveLock is sufficient.  The thing that we want a
table-level lock to do is to ensure that there are no *table level*
reasons for not proceeding with the update.  An obvious example is that
we don't want any VACUUM to be running on that table and perhaps moving
our tuple around.

I think you are concerned about whether there might be concurrent
updates to the particular tuple we want to update.  You are right to
be concerned, but obtaining a table-level lock is a bad solution to
that because it doesn't allow for any concurrency --- if we do it
that way then we can have only one writer *per table*.  Also the risks
of deadlock from obtaining locks in different orders in different parts
of the code become much higher than if we are locking individual tuples.

Most of the places where we need to update multiple system-catalog
tuples in a single command are schema updates to a single relation.
For that, I think a reasonable approach is to rely on locking the
relation being modified; we're going to want exclusive lock on that
relation anyway, and that should be enough to ensure no one else is
doing schema updates on that relation.  Commands that alter just a
single system-catalog tuple, like DROP TYPE or some such, don't
really need any more interlocking than will be supplied automatically
by heap_update or heap_delete.

In short I don't think there's a big problem here.
        regards, tom lane

PS: I'm working on the SearchSysCache changes I outlined, and hopefully
will be ready to commit tomorrow.  One thing I've discovered already
is that there is still a use for SearchSysCacheCopy: callers who intend
to modify and update the tuple want to receive a modifiable copy, not
a cached tuple that they mustn't change.

Re: SearchSysCacheTuple(Copy)

From

Bruce Momjian

Date:

15 November 2000, 23:56:35

> 4. SearchSysCacheTupleCopy() goes away, since we may as well use
>    SearchSysCacheTuple() and ReleaseSysCache() instead of
>    SearchSysCacheTupleCopy() and heap_freetuple().

Agreed.  The rest sounds good too.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026