Thread: [HACKERS] Shaky coding for vacuuming partitioned relations

[HACKERS] Shaky coding for vacuuming partitioned relations

From

Tom Lane

Date:

22 September 2017, 22:13:10

Somebody inserted this into vacuum.c's get_rel_oids():
tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relid)); if (!HeapTupleIsValid(tuple))
elog(ERROR,"cache lookup failed for relation %u", relid);

apparently without having read the very verbose comment two lines above,
which points out that we're not taking any lock on the target relation.
So, if that relation is concurrently being dropped, you're likely to
get "cache lookup failed for relation NNNN" rather than anything more
user-friendly.

A minimum-change fix would be to replace the elog() with an ereport
that produces the same "relation does not exist" error you'd have
gotten from RangeVarGetRelid, had the concurrent DROP TABLE committed
a few microseconds earlier. But that feels like its's band-aiding
around the problem.

What I'm wondering about is changing the RangeVarGetRelid call to take
ShareUpdateExclusiveLock rather than no lock. That would protect the
syscache lookup, and it would also make the find_all_inheritors call
a lot more meaningful.

If we're doing a VACUUM, the ShareUpdateExclusiveLock would be dropped
as soon as we close the caller's transaction, and then we'd acquire
the same or stronger lock inside vacuum_rel(). So that seems fine.
If we're doing an ANALYZE, then the lock would continue to be held
and analyze_rel would merely be acquiring it an extra time, so we'd
actually be removing a race-condition failure scenario for ANALYZE.
This would mean a few more cycles in lock management, but since this
only applies to a manual VACUUM or ANALYZE that specifies a table
name, I'm not too concerned about that.

Thoughts?
regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Shaky coding for vacuuming partitioned relations

From

Michael Paquier

Date:

25 September 2017, 06:10:50

On Sat, Sep 23, 2017 at 4:13 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Somebody inserted this into vacuum.c's get_rel_oids():
>
>         tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
>         if (!HeapTupleIsValid(tuple))
>             elog(ERROR, "cache lookup failed for relation %u", relid);
>
> apparently without having read the very verbose comment two lines above,
> which points out that we're not taking any lock on the target relation.
> So, if that relation is concurrently being dropped, you're likely to
> get "cache lookup failed for relation NNNN" rather than anything more
> user-friendly.

This has been overlooked during the lookups of 3c3bb99, and by
multiple people including me. elog() should never be things users can
face, as well as cache lookups.

> A minimum-change fix would be to replace the elog() with an ereport
> that produces the same "relation does not exist" error you'd have
> gotten from RangeVarGetRelid, had the concurrent DROP TABLE committed
> a few microseconds earlier.  But that feels like its's band-aiding
> around the problem.

Yeah, that's not right. This is a cache lookup error at the end.

> What I'm wondering about is changing the RangeVarGetRelid call to take
> ShareUpdateExclusiveLock rather than no lock.  That would protect the
> syscache lookup, and it would also make the find_all_inheritors call
> a lot more meaningful.
>
> If we're doing a VACUUM, the ShareUpdateExclusiveLock would be dropped
> as soon as we close the caller's transaction, and then we'd acquire
> the same or stronger lock inside vacuum_rel().  So that seems fine.
> If we're doing an ANALYZE, then the lock would continue to be held
> and analyze_rel would merely be acquiring it an extra time, so we'd
> actually be removing a race-condition failure scenario for ANALYZE.
> This would mean a few more cycles in lock management, but since this
> only applies to a manual VACUUM or ANALYZE that specifies a table
> name, I'm not too concerned about that.

I think that I am +1 on that. Testing such a thing I am not seeing
anything wrong either. The call to find_all_inheritors should also use
ShareUpdateExclusiveLock, and the lock on top of RangeVarGetRelid()
needs to be reworked.

Attached is a proposal of patch.

> Thoughts?

As long as I don't forget... Another thing currently on HEAD and
REL_10_STABLE is that OIDs of partitioned tables are used, but the
RangeVar of the parent is used for error reports. This leads to
incorrect reports if a partition gets away in the middle of autovacuum
as only information about the parent is reported to the user. I am not
saying that this needs to be fixed in REL_10_STABLE at this stage
though as this would require some refactoring similar to what the
patch adding support for VACUUM with multiple relations does. But I
digress here.
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

vacuum-partition-locks.patch

Re: [HACKERS] Shaky coding for vacuuming partitioned relations

From

Amit Langote

Date:

25 September 2017, 10:42:22

On 2017/09/25 12:10, Michael Paquier wrote:
> On Sat, Sep 23, 2017 at 4:13 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Somebody inserted this into vacuum.c's get_rel_oids():
>>
>>         tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
>>         if (!HeapTupleIsValid(tuple))
>>             elog(ERROR, "cache lookup failed for relation %u", relid);
>>
>> apparently without having read the very verbose comment two lines above,
>> which points out that we're not taking any lock on the target relation.
>> So, if that relation is concurrently being dropped, you're likely to
>> get "cache lookup failed for relation NNNN" rather than anything more
>> user-friendly.
> 
> This has been overlooked during the lookups of 3c3bb99, and by
> multiple people including me. elog() should never be things users can
> face, as well as cache lookups.

Agreed, that was an oversight.

>> A minimum-change fix would be to replace the elog() with an ereport
>> that produces the same "relation does not exist" error you'd have
>> gotten from RangeVarGetRelid, had the concurrent DROP TABLE committed
>> a few microseconds earlier.  But that feels like its's band-aiding
>> around the problem.
> 
> Yeah, that's not right. This is a cache lookup error at the end.

Also agreed that fixing the locking here would be a better solution.

>> What I'm wondering about is changing the RangeVarGetRelid call to take
>> ShareUpdateExclusiveLock rather than no lock.  That would protect the
>> syscache lookup, and it would also make the find_all_inheritors call
>> a lot more meaningful.
>>
>> If we're doing a VACUUM, the ShareUpdateExclusiveLock would be dropped
>> as soon as we close the caller's transaction, and then we'd acquire
>> the same or stronger lock inside vacuum_rel().  So that seems fine.
>> If we're doing an ANALYZE, then the lock would continue to be held
>> and analyze_rel would merely be acquiring it an extra time, so we'd
>> actually be removing a race-condition failure scenario for ANALYZE.
>> This would mean a few more cycles in lock management, but since this
>> only applies to a manual VACUUM or ANALYZE that specifies a table
>> name, I'm not too concerned about that.
> 
> I think that I am +1 on that. Testing such a thing I am not seeing
> anything wrong either. The call to find_all_inheritors should also use
> ShareUpdateExclusiveLock, and the lock on top of RangeVarGetRelid()
> needs to be reworked.
> 
> Attached is a proposal of patch.

Hmm, I'm not sure if we need to lock the partitions, too.  Locks taken by
find_all_inheritors() will be gone once the already-running transaction is
ended by the caller (vacuum()).  get_rel_oids() should just lock the table
mentioned in the command (the parent table), so that find_all_inheritors()
returns the correct partition list, as Tom perhaps alluded to when he said
"it would also make the find_all_inheritors call a lot more meaningful.",
but...

When vacuum() iterates that list and calls vacuum_rel() for each relation
in the list, it might be the case that a relation in that list is no
longer a partition of the parent.  So, one might say that it would get
vacuumed unnecessarily.  Perhaps that's fine?

>> Thoughts?
> 
> As long as I don't forget... Another thing currently on HEAD and
> REL_10_STABLE is that OIDs of partitioned tables are used, but the
> RangeVar of the parent is used for error reports. This leads to
> incorrect reports if a partition gets away in the middle of autovacuum
> as only information about the parent is reported to the user.

Oh, you're right.  The original RangeVar (corresponding to the table
mentioned in the command) refers to the parent table.

> I am not
> saying that this needs to be fixed in REL_10_STABLE at this stage
> though as this would require some refactoring similar to what the
> patch adding support for VACUUM with multiple relations does.

I think it would be better to fix that independently somehow.  VACUUM on
partitioned tables is handled by calling vacuum_rel() on individual
partitions.  It would be nice if vacuum_rel() didn't have to refer to the
RangeVar.  Could we not use get_rel_name(relid) in place of
relation->relname?  I haven't seen the other patch yet though, so maybe
I'm missing something.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Shaky coding for vacuuming partitioned relations

From

Michael Paquier

Date:

25 September 2017, 12:37:20

On Mon, Sep 25, 2017 at 4:42 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
> On 2017/09/25 12:10, Michael Paquier wrote:
> Hmm, I'm not sure if we need to lock the partitions, too.  Locks taken by
> find_all_inheritors() will be gone once the already-running transaction is
> ended by the caller (vacuum()).  get_rel_oids() should just lock the table
> mentioned in the command (the parent table), so that find_all_inheritors()
> returns the correct partition list, as Tom perhaps alluded to when he said
> "it would also make the find_all_inheritors call a lot more meaningful.",
> but...

It is important to hold the lock for child tables until the end of the
transaction actually to fill in correctly the RangeVar associated to
each partition when scanning them.

> When vacuum() iterates that list and calls vacuum_rel() for each relation
> in the list, it might be the case that a relation in that list is no
> longer a partition of the parent.  So, one might say that it would get
> vacuumed unnecessarily.  Perhaps that's fine?

I am personally fine with that. Any checks would prove to complicate
the code for not much additional value.

>> I am not
>> saying that this needs to be fixed in REL_10_STABLE at this stage
>> though as this would require some refactoring similar to what the
>> patch adding support for VACUUM with multiple relations does.
>
> I think it would be better to fix that independently somehow.  VACUUM on
> partitioned tables is handled by calling vacuum_rel() on individual
> partitions.  It would be nice if vacuum_rel() didn't have to refer to the
> RangeVar.  Could we not use get_rel_name(relid) in place of
> relation->relname?  I haven't seen the other patch yet though, so maybe
> I'm missing something.

The RangeVar is used for error reporting, and once you reach
vacuum_rel, the relation and its OID may be gone. So you need to save
the information of the RangeVar beforehand when scanning the
relations. That's one reason behind having the VacuumRelation stuff
discussed on the thread for multiple table VACUUM, this way you store
for each item to be vacuumed:
- A RangeVar.
- A list of columns.
- An OID saved by vacuum deduced from the RangeVar.
That's quite some refactoring, so my opinion is that this ship has
sailed already for REL_10_STABLE, and that we had better live with
that in 10.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Shaky coding for vacuuming partitioned relations

From

Masahiko Sawada

Date:

25 September 2017, 13:57:31

On Mon, Sep 25, 2017 at 12:10 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Sat, Sep 23, 2017 at 4:13 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Somebody inserted this into vacuum.c's get_rel_oids():
>>
>>         tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
>>         if (!HeapTupleIsValid(tuple))
>>             elog(ERROR, "cache lookup failed for relation %u", relid);
>>
>> apparently without having read the very verbose comment two lines above,
>> which points out that we're not taking any lock on the target relation.
>> So, if that relation is concurrently being dropped, you're likely to
>> get "cache lookup failed for relation NNNN" rather than anything more
>> user-friendly.
>
> This has been overlooked during the lookups of 3c3bb99, and by
> multiple people including me. elog() should never be things users can
> face, as well as cache lookups.

FWIW, the same thing can happen when specifying an invalid replication
origin name to pg_replication_origin_advance() and
pg_replication_origin_progress(). These probably should fixed as well.

>
>> A minimum-change fix would be to replace the elog() with an ereport
>> that produces the same "relation does not exist" error you'd have
>> gotten from RangeVarGetRelid, had the concurrent DROP TABLE committed
>> a few microseconds earlier.  But that feels like its's band-aiding
>> around the problem.
>
> Yeah, that's not right. This is a cache lookup error at the end.
>
>> What I'm wondering about is changing the RangeVarGetRelid call to take
>> ShareUpdateExclusiveLock rather than no lock.  That would protect the
>> syscache lookup, and it would also make the find_all_inheritors call
>> a lot more meaningful.
>>
>> If we're doing a VACUUM, the ShareUpdateExclusiveLock would be dropped
>> as soon as we close the caller's transaction, and then we'd acquire
>> the same or stronger lock inside vacuum_rel().  So that seems fine.
>> If we're doing an ANALYZE, then the lock would continue to be held
>> and analyze_rel would merely be acquiring it an extra time, so we'd
>> actually be removing a race-condition failure scenario for ANALYZE.
>> This would mean a few more cycles in lock management, but since this
>> only applies to a manual VACUUM or ANALYZE that specifies a table
>> name, I'm not too concerned about that.
>
> I think that I am +1 on that. Testing such a thing I am not seeing
> anything wrong either. The call to find_all_inheritors should also use
> ShareUpdateExclusiveLock, and the lock on top of RangeVarGetRelid()
> needs to be reworked.
>
> Attached is a proposal of patch.

FWIW, I agreed the approach of this patch, and found no problems in the patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Shaky coding for vacuuming partitioned relations

From

Michael Paquier

Date:

25 September 2017, 14:22:44

On Mon, Sep 25, 2017 at 7:57 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> FWIW, the same thing can happen when specifying an invalid replication
> origin name to pg_replication_origin_advance() and
> pg_replication_origin_progress(). These probably should fixed as well.

I have spawned a thread about that stuff three weeks ago:
https://www.postgresql.org/message-id/CAB7nPqQtPg%2BLKKtzdKN26judHcvPZ0s1gNigzOT4j8CYuuuBYg%40mail.gmail.com
With a patch registered where it should be:
https://commitfest.postgresql.org/15/1290/
So I would suggest to keep the discussions about this problem in its own thread.

(Bonus: ALTER TABLE queries in parallel with multiple sessions and
enjoy the result)
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Shaky coding for vacuuming partitioned relations

From

Masahiko Sawada

Date:

25 September 2017, 14:44:35

On Mon, Sep 25, 2017 at 8:22 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Mon, Sep 25, 2017 at 7:57 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> FWIW, the same thing can happen when specifying an invalid replication
>> origin name to pg_replication_origin_advance() and
>> pg_replication_origin_progress(). These probably should fixed as well.
>
> I have spawned a thread about that stuff three weeks ago:
> https://www.postgresql.org/message-id/CAB7nPqQtPg%2BLKKtzdKN26judHcvPZ0s1gNigzOT4j8CYuuuBYg%40mail.gmail.com
> With a patch registered where it should be:
> https://commitfest.postgresql.org/15/1290/
> So I would suggest to keep the discussions about this problem in its own thread.

Thank you for the information! I'll look at the thread and review the patch.

> (Bonus: ALTER TABLE queries in parallel with multiple sessions and
> enjoy the result)

Will try it :-)

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Shaky coding for vacuuming partitioned relations

From

Tom Lane

Date:

25 September 2017, 17:32:25

Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes:
> On 2017/09/25 12:10, Michael Paquier wrote:
>> As long as I don't forget... Another thing currently on HEAD and
>> REL_10_STABLE is that OIDs of partitioned tables are used, but the
>> RangeVar of the parent is used for error reports. This leads to
>> incorrect reports if a partition gets away in the middle of autovacuum
>> as only information about the parent is reported to the user.

> Oh, you're right.  The original RangeVar (corresponding to the table
> mentioned in the command) refers to the parent table.

Yeah, I'd noticed that while reviewing the vacuum-multiple-tables patch.
My thought about fixing it was to pass a null RangeVar when handling a
table we'd identified through inheritance or pg_class-scanning, to
indicate that this wasn't a table named in the original command.  This
only works conveniently if you decide that it's appropriate to silently
ignore relation_open failure on such table OIDs, but I think it is.

Not sure about whether we ought to try to fix that in v10.  It's a
mostly-cosmetic problem in what ought to be an infrequent corner case,
so it might not be worth taking risks for post-RC1.  OTOH, I would
not be surprised to get bug reports about it down the road.
        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Shaky coding for vacuuming partitioned relations

From

"Bossart, Nathan"

Date:

25 September 2017, 19:47:05

On 9/24/17, 10:12 PM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
> Attached is a proposal of patch.

The patch seems reasonable to me, and I haven't encountered any issues in
my tests, even after applying the vacuum-multiple-relations patch on top
of it.

+         * Take a lock here for the relation lookup. If ANALYZE or VACUUM spawn
+         * multiple transactions, the lock taken here will be gone once the
+         * current transaction running commits, which could cause the relation
+         * to be gone, or the RangeVar might not refer to the OID looked up here.

I think this could be slightly misleading.  Perhaps it would be more
accurate to say that the lock will be gone any time vacuum() creates a new
transaction (either in vacuum_rel() or when use_own_xacts is true).

Nathan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Shaky coding for vacuuming partitioned relations

From

Michael Paquier

Date:

26 September 2017, 02:48:40

On Mon, Sep 25, 2017 at 11:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Yeah, I'd noticed that while reviewing the vacuum-multiple-tables patch.
> My thought about fixing it was to pass a null RangeVar when handling a
> table we'd identified through inheritance or pg_class-scanning, to
> indicate that this wasn't a table named in the original command.  This
> only works conveniently if you decide that it's appropriate to silently
> ignore relation_open failure on such table OIDs, but I think it is.
>
> Not sure about whether we ought to try to fix that in v10.  It's a
> mostly-cosmetic problem in what ought to be an infrequent corner case,
> so it might not be worth taking risks for post-RC1.  OTOH, I would
> not be surprised to get bug reports about it down the road.

Something like that looks like a good compromise for v10. I would
rather see a more complete fix with each relation name reported
correctly on HEAD though. The information provided would be useful for
users when using autovacuum when skipping a relation because no lock
could be taken on it.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Shaky coding for vacuuming partitioned relations

From

Michael Paquier

Date:

26 September 2017, 02:50:59

On Tue, Sep 26, 2017 at 1:47 AM, Bossart, Nathan <bossartn@amazon.com> wrote:
> On 9/24/17, 10:12 PM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
>> Attached is a proposal of patch.
>
> The patch seems reasonable to me, and I haven't encountered any issues in
> my tests, even after applying the vacuum-multiple-relations patch on top
> of it.

Thanks for the review, Nathan!

> +                * Take a lock here for the relation lookup. If ANALYZE or VACUUM spawn
> +                * multiple transactions, the lock taken here will be gone once the
> +                * current transaction running commits, which could cause the relation
> +                * to be gone, or the RangeVar might not refer to the OID looked up here.
>
> I think this could be slightly misleading.  Perhaps it would be more
> accurate to say that the lock will be gone any time vacuum() creates a new
> transaction (either in vacuum_rel() or when use_own_xacts is true).

The comment of the proposed patch matches as much as possible what is
currently on HEAD, so I would still go with something close to that.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Shaky coding for vacuuming partitioned relations

From

"Bossart, Nathan"

Date:

26 September 2017, 03:04:38

On 9/25/17, 6:51 PM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
>> +                * Take a lock here for the relation lookup. If ANALYZE or VACUUM spawn
>> +                * multiple transactions, the lock taken here will be gone once the
>> +                * current transaction running commits, which could cause the relation
>> +                * to be gone, or the RangeVar might not refer to the OID looked up here.
>>
>> I think this could be slightly misleading.  Perhaps it would be more
>> accurate to say that the lock will be gone any time vacuum() creates a new
>> transaction (either in vacuum_rel() or when use_own_xacts is true).
>
> The comment of the proposed patch matches as much as possible what is
> currently on HEAD, so I would still go with something close to that.

Sure.  This is just a minor point, and I could see the argument that your
phrasing is more concise, anyway.

Nathan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Shaky coding for vacuuming partitioned relations

From

Michael Paquier

Date:

26 September 2017, 03:51:57

On Tue, Sep 26, 2017 at 8:48 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Mon, Sep 25, 2017 at 11:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Yeah, I'd noticed that while reviewing the vacuum-multiple-tables patch.
>> My thought about fixing it was to pass a null RangeVar when handling a
>> table we'd identified through inheritance or pg_class-scanning, to
>> indicate that this wasn't a table named in the original command.  This
>> only works conveniently if you decide that it's appropriate to silently
>> ignore relation_open failure on such table OIDs, but I think it is.
>>
>> Not sure about whether we ought to try to fix that in v10.  It's a
>> mostly-cosmetic problem in what ought to be an infrequent corner case,
>> so it might not be worth taking risks for post-RC1.  OTOH, I would
>> not be surprised to get bug reports about it down the road.
>
> Something like that looks like a good compromise for v10. I would
> rather see a more complete fix with each relation name reported
> correctly on HEAD though. The information provided would be useful for
> users when using autovacuum when skipping a relation because no lock
> could be taken on it.

Actually, perhaps this should be tracked as an open item? A simple fix
is leading to the path that no information is better than misleading
information in this case.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Shaky coding for vacuuming partitioned relations

From

Amit Langote

Date:

26 September 2017, 04:54:40

On 2017/09/25 18:37, Michael Paquier wrote:
> On Mon, Sep 25, 2017 at 4:42 PM, Amit Langote
> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>> On 2017/09/25 12:10, Michael Paquier wrote:
>> Hmm, I'm not sure if we need to lock the partitions, too.  Locks taken by
>> find_all_inheritors() will be gone once the already-running transaction is
>> ended by the caller (vacuum()).  get_rel_oids() should just lock the table
>> mentioned in the command (the parent table), so that find_all_inheritors()
>> returns the correct partition list, as Tom perhaps alluded to when he said
>> "it would also make the find_all_inheritors call a lot more meaningful.",
>> but...
> 
> It is important to hold the lock for child tables until the end of the
> transaction actually to fill in correctly the RangeVar associated to
> each partition when scanning them.

I think that's right, although, I don't see any new RangeVar created under
vacuum() at the moment.  Maybe, you're referring to the Nathan's patch
that perhaps does that.

>> When vacuum() iterates that list and calls vacuum_rel() for each relation
>> in the list, it might be the case that a relation in that list is no
>> longer a partition of the parent.  So, one might say that it would get
>> vacuumed unnecessarily.  Perhaps that's fine?
> 
> I am personally fine with that. Any checks would prove to complicate
> the code for not much additional value.

Tend to agree here.

>>> I am not
>>> saying that this needs to be fixed in REL_10_STABLE at this stage
>>> though as this would require some refactoring similar to what the
>>> patch adding support for VACUUM with multiple relations does.
>>
>> I think it would be better to fix that independently somehow.  VACUUM on
>> partitioned tables is handled by calling vacuum_rel() on individual
>> partitions.  It would be nice if vacuum_rel() didn't have to refer to the
>> RangeVar.  Could we not use get_rel_name(relid) in place of
>> relation->relname?  I haven't seen the other patch yet though, so maybe
>> I'm missing something.
> 
> The RangeVar is used for error reporting, and once you reach
> vacuum_rel, the relation and its OID may be gone. So you need to save
> the information of the RangeVar beforehand when scanning the
> relations. That's one reason behind having the VacuumRelation stuff
> discussed on the thread for multiple table VACUUM, this way you store
> for each item to be vacuumed:
> - A RangeVar.
> - A list of columns.
> - An OID saved by vacuum deduced from the RangeVar.
> That's quite some refactoring, so my opinion is that this ship has
> sailed already for REL_10_STABLE, and that we had better live with
> that in 10.

Yeah, your simple patch seems enough for v10.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Shaky coding for vacuuming partitioned relations

From

Amit Langote

Date:

26 September 2017, 04:55:41

On 2017/09/26 9:51, Michael Paquier wrote:
> On Tue, Sep 26, 2017 at 8:48 AM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Mon, Sep 25, 2017 at 11:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Yeah, I'd noticed that while reviewing the vacuum-multiple-tables patch.
>>> My thought about fixing it was to pass a null RangeVar when handling a
>>> table we'd identified through inheritance or pg_class-scanning, to
>>> indicate that this wasn't a table named in the original command.  This
>>> only works conveniently if you decide that it's appropriate to silently
>>> ignore relation_open failure on such table OIDs, but I think it is.
>>>
>>> Not sure about whether we ought to try to fix that in v10.  It's a
>>> mostly-cosmetic problem in what ought to be an infrequent corner case,
>>> so it might not be worth taking risks for post-RC1.  OTOH, I would
>>> not be surprised to get bug reports about it down the road.
>>
>> Something like that looks like a good compromise for v10. I would
>> rather see a more complete fix with each relation name reported
>> correctly on HEAD though. The information provided would be useful for
>> users when using autovacuum when skipping a relation because no lock
>> could be taken on it.
> 
> Actually, perhaps this should be tracked as an open item? A simple fix
> is leading to the path that no information is better than misleading
> information in this case.

+1.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Shaky coding for vacuuming partitioned relations

From

Michael Paquier

Date:

26 September 2017, 05:12:09

On Tue, Sep 26, 2017 at 10:54 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
> I think that's right, although, I don't see any new RangeVar created under
> vacuum() at the moment.  Maybe, you're referring to the Nathan's patch
> that perhaps does that.

Yes, you can check what it does here (last version):
766556DD-AA3C-42F7-ACF4-5DC97F41F151@amazon.com
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Shaky coding for vacuuming partitioned relations

From

Michael Paquier

Date:

26 September 2017, 05:14:15

On Tue, Sep 26, 2017 at 10:55 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
> On 2017/09/26 9:51, Michael Paquier wrote:
>> On Tue, Sep 26, 2017 at 8:48 AM, Michael Paquier
>> <michael.paquier@gmail.com> wrote:
>>> On Mon, Sep 25, 2017 at 11:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>> Yeah, I'd noticed that while reviewing the vacuum-multiple-tables patch.
>>>> My thought about fixing it was to pass a null RangeVar when handling a
>>>> table we'd identified through inheritance or pg_class-scanning, to
>>>> indicate that this wasn't a table named in the original command.  This
>>>> only works conveniently if you decide that it's appropriate to silently
>>>> ignore relation_open failure on such table OIDs, but I think it is.
>>>>
>>>> Not sure about whether we ought to try to fix that in v10.  It's a
>>>> mostly-cosmetic problem in what ought to be an infrequent corner case,
>>>> so it might not be worth taking risks for post-RC1.  OTOH, I would
>>>> not be surprised to get bug reports about it down the road.
>>>
>>> Something like that looks like a good compromise for v10. I would
>>> rather see a more complete fix with each relation name reported
>>> correctly on HEAD though. The information provided would be useful for
>>> users when using autovacuum when skipping a relation because no lock
>>> could be taken on it.
>>
>> Actually, perhaps this should be tracked as an open item? A simple fix
>> is leading to the path that no information is better than misleading
>> information in this case.
>
> +1.

Let's track it then and spawn a separate thread with a patch. Do you
want to work on it or should I? The solution proposed by Tom seems
like the correct answer. I am adding an item for now, we could always
link it to a thread later on.

Let's also track the problem that has been reported on this thread.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Shaky coding for vacuuming partitioned relations

From

Amit Langote

Date:

26 September 2017, 11:59:03

On 2017/09/26 11:14, Michael Paquier wrote:
> On Tue, Sep 26, 2017 at 10:55 AM, Amit Langote wrote:
>> On 2017/09/26 9:51, Michael Paquier wrote:
>>> On Tue, Sep 26, 2017 at 8:48 AM, Michael Paquier wrote:
>>>> Something like that looks like a good compromise for v10. I would
>>>> rather see a more complete fix with each relation name reported
>>>> correctly on HEAD though. The information provided would be useful for
>>>> users when using autovacuum when skipping a relation because no lock
>>>> could be taken on it.
>>>
>>> Actually, perhaps this should be tracked as an open item? A simple fix
>>> is leading to the path that no information is better than misleading
>>> information in this case.
>>
>> +1.
> 
> Let's track it then and spawn a separate thread with a patch. Do you
> want to work on it or should I? The solution proposed by Tom seems
> like the correct answer. I am adding an item for now, we could always
> link it to a thread later on.

I assume you mean the Tom's solution wherein we pass a null RangeVar for
tables that were not mentioned in the command (that is, for partitions of
a partitioned table that was mentioned in the command or for tables read
from pg_class when a user ran VACUUM without mentioning any table name).

Please feel free to come up with the patch for the same, if you have time.
I'll be glad to review.

> Let's also track the problem that has been reported on this thread.

Yes.  Just to be clear, the original problem this thread was started for
is that get_rel_oids() may emit a less user-friendly "cache lookup failed
for relation NNNN" error, which it shouldn't.  We should fix the locking
like the patch you posted does, to avoid having to come across this
syscache lookup error.

Thanks,
Amit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Shaky coding for vacuuming partitioned relations

From

Amit Langote

Date:

26 September 2017, 11:59:30

On 2017/09/26 11:12, Michael Paquier wrote:
> On Tue, Sep 26, 2017 at 10:54 AM, Amit Langote
> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>> I think that's right, although, I don't see any new RangeVar created under
>> vacuum() at the moment.  Maybe, you're referring to the Nathan's patch
>> that perhaps does that.
> 
> Yes, you can check what it does here (last version):
> 766556DD-AA3C-42F7-ACF4-5DC97F41F151@amazon.com

Thanks, will look.

Regards,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Shaky coding for vacuuming partitioned relations

From

Noah Misch

Date:

28 September 2017, 08:31:23

On Fri, Sep 22, 2017 at 03:13:10PM -0400, Tom Lane wrote:
> Somebody inserted this into vacuum.c's get_rel_oids():
> 
>         tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
>         if (!HeapTupleIsValid(tuple))
>             elog(ERROR, "cache lookup failed for relation %u", relid);
> 
> apparently without having read the very verbose comment two lines above,
> which points out that we're not taking any lock on the target relation.
> So, if that relation is concurrently being dropped, you're likely to
> get "cache lookup failed for relation NNNN" rather than anything more
> user-friendly.
> 
> A minimum-change fix would be to replace the elog() with an ereport
> that produces the same "relation does not exist" error you'd have
> gotten from RangeVarGetRelid, had the concurrent DROP TABLE committed
> a few microseconds earlier.  But that feels like its's band-aiding
> around the problem.
> 
> What I'm wondering about is changing the RangeVarGetRelid call to take
> ShareUpdateExclusiveLock rather than no lock.  That would protect the
> syscache lookup, and it would also make the find_all_inheritors call
> a lot more meaningful.
> 
> If we're doing a VACUUM, the ShareUpdateExclusiveLock would be dropped
> as soon as we close the caller's transaction, and then we'd acquire
> the same or stronger lock inside vacuum_rel().  So that seems fine.
> If we're doing an ANALYZE, then the lock would continue to be held
> and analyze_rel would merely be acquiring it an extra time, so we'd
> actually be removing a race-condition failure scenario for ANALYZE.
> This would mean a few more cycles in lock management, but since this
> only applies to a manual VACUUM or ANALYZE that specifies a table
> name, I'm not too concerned about that.
> 
> Thoughts?

This thread now has two open items, both of them pertaining to VACUUM error
messages involving partitioning.  The pair is probably best treated as a
single open item.

[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.  Robert,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Shaky coding for vacuuming partitioned relations

From

Robert Haas

Date:

29 September 2017, 19:17:05

On Thu, Sep 28, 2017 at 1:31 AM, Noah Misch <noah@leadboat.com> wrote:
> This thread now has two open items, both of them pertaining to VACUUM error
> messages involving partitioning.  The pair is probably best treated as a
> single open item.

If I understand correctly, problem #1 is that get_rel_oids() can emit
a user-visible cache lookup failure message. There is a proposed patch
by Michael Paquier which appears to implement the design suggested by
Tom.  I think that the normal procedure would be for Tom to commit
that change if he's happy with it.

I don't think I understand problem #2.  I think the concern is about
reporting the proper relation name when VACUUM cascades from a
partitioned table to its children and then some kind of concurrent DDL
happens, but I don't see a clear explanation on the thread as to what
exactly the failure scenario is, and I didn't see a problem in some
simple tests I just ran.  Furthermore, it sounds like this might get
fixed as part of committing the patch to allow VACUUM to mention
multiple tables, which Tom has indicated he will handle.

> [Action required within three days.  This is a generic notification.]

If someone could specify a particular action they'd like me to take,
I'll consider taking it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Shaky coding for vacuuming partitioned relations

From

"Bossart, Nathan"

Date:

29 September 2017, 19:36:25

On 9/29/17, 11:18 AM, "Robert Haas" <robertmhaas@gmail.com> wrote:
> I don't think I understand problem #2.  I think the concern is about
> reporting the proper relation name when VACUUM cascades from a
> partitioned table to its children and then some kind of concurrent DDL
> happens, but I don't see a clear explanation on the thread as to what
> exactly the failure scenario is, and I didn't see a problem in some
> simple tests I just ran.  Furthermore, it sounds like this might get
> fixed as part of committing the patch to allow VACUUM to mention
> multiple tables, which Tom has indicated he will handle.

Yes.  It looks like v10 is safe, and the vacuum-multiple-relations
patch should help prevent any future logging issues caused by this.

Discussion here: http://postgr.es/m/CAB7nPqRX1465FP%2Bameysxxt63tCQDDW6KvaTPMfkSxaT1TFGfw%40mail.gmail.com

Nathan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Shaky coding for vacuuming partitioned relations

From

Tom Lane

Date:

29 September 2017, 19:43:13

Robert Haas <robertmhaas@gmail.com> writes:
> If I understand correctly, problem #1 is that get_rel_oids() can emit
> a user-visible cache lookup failure message. There is a proposed patch
> by Michael Paquier which appears to implement the design suggested by
> Tom.  I think that the normal procedure would be for Tom to commit
> that change if he's happy with it.

Yes, I'm happy to take responsibility for this.

> I don't think I understand problem #2.  I think the concern is about
> reporting the proper relation name when VACUUM cascades from a
> partitioned table to its children and then some kind of concurrent DDL
> happens, but I don't see a clear explanation on the thread as to what
> exactly the failure scenario is, and I didn't see a problem in some
> simple tests I just ran.  Furthermore, it sounds like this might get
> fixed as part of committing the patch to allow VACUUM to mention
> multiple tables, which Tom has indicated he will handle.

I think the conclusion was that this wouldn't actually happen in v10,
but I will take a closer look and do something about it if it is possible.
        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Shaky coding for vacuuming partitioned relations

From

Robert Haas

Date:

29 September 2017, 19:46:23

On Fri, Sep 29, 2017 at 12:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> If I understand correctly, problem #1 is that get_rel_oids() can emit
>> a user-visible cache lookup failure message. There is a proposed patch
>> by Michael Paquier which appears to implement the design suggested by
>> Tom.  I think that the normal procedure would be for Tom to commit
>> that change if he's happy with it.
>
> Yes, I'm happy to take responsibility for this.

Great, thanks.

>> I don't think I understand problem #2.  I think the concern is about
>> reporting the proper relation name when VACUUM cascades from a
>> partitioned table to its children and then some kind of concurrent DDL
>> happens, but I don't see a clear explanation on the thread as to what
>> exactly the failure scenario is, and I didn't see a problem in some
>> simple tests I just ran.  Furthermore, it sounds like this might get
>> fixed as part of committing the patch to allow VACUUM to mention
>> multiple tables, which Tom has indicated he will handle.
>
> I think the conclusion was that this wouldn't actually happen in v10,
> but I will take a closer look and do something about it if it is possible.

Even better, thanks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Shaky coding for vacuuming partitioned relations

From

Tom Lane

Date:

29 September 2017, 23:44:42

Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Sep 29, 2017 at 12:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> If I understand correctly, problem #1 is that get_rel_oids() can emit
>>> a user-visible cache lookup failure message. There is a proposed patch
>>> by Michael Paquier which appears to implement the design suggested by
>>> Tom.  I think that the normal procedure would be for Tom to commit
>>> that change if he's happy with it.

>> Yes, I'm happy to take responsibility for this.

> Great, thanks.

After thinking more carefully about the idea I proposed upthread (viz,
take and keep ShareUpdateExclusiveLock), I realized that that will be
pretty horrid once the proposed patch to allow VACUUM to name multiple
target tables goes in.  We'd be trying to take a pretty strong lock on
multiple tables at once during get_rel_oids(), creating substantial risk
of deadlock.  Up to now, VACUUM has always been careful to not take more
than one such lock at once, and I think we need to preserve that property.

The simplest solution therefore seems to be to just take AccessShareLock
and then release it as soon as we're done adding the rel and its
partitions to the list of target rels.  There's little point in taking a
stronger lock here if we are going to end the transaction and drop it
later, and it's a nicely low-risk answer for v10 in any case.

>>> I don't think I understand problem #2.  I think the concern is about
>>> reporting the proper relation name when VACUUM cascades from a
>>> partitioned table to its children and then some kind of concurrent DDL
>>> happens, but I don't see a clear explanation on the thread as to what
>>> exactly the failure scenario is, and I didn't see a problem in some
>>> simple tests I just ran.

>> I think the conclusion was that this wouldn't actually happen in v10,
>> but I will take a closer look and do something about it if it is possible.

> Even better, thanks.

After looking closer, I confirmed that there's no live bug today,
though it's a closer shave than one could wish.  The RangeVar passed to
vacuum_rel and/or analyze_rel can be wrong, or even NULL, but it is
only used when autovacuum.c requests failure logging, and for that
particular use-case the RangeVar will always be correct.

I felt though that I didn't want to just leave it like that, particularly
in view of the existence of multiple comments claiming things that were
entirely false.  So I adjusted those two functions to explicitly allow
for NULL RangeVars, and tried to improve the comments.

Nathan Bossart wants to see logging of relation-open failures in more
cases, and to get that we're going to have to work harder to track
valid RangeVars for target relations.  But that is not v10 material.
        regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers