Thread: Autovacuum of independent tables

Autovacuum of independent tables

From

Michael Holzman

Date:

08 September 2020, 07:32:36

Hi,

I have two applications A and B. A runs SELECT statements only and only on tableA. B actively updates tableB, A never looks into tableB. B has nothing to do with tableA.

Still, if A is inside a long running transaction, autovacuum does not handle tableB. Why is it so?

Regards,
Michael Holzman

Re: Autovacuum of independent tables

From

Pavel Stehule

Date:

08 September 2020, 07:45:53

út 8. 9. 2020 v 9:32 odesílatel Michael Holzman <michaelholzman@gmail.com> napsal:

Hi,

I have two applications A and B. A runs SELECT statements only and only on tableA. B actively updates tableB, A never looks into tableB. B has nothing to do with tableA.

Still, if A is inside a long running transaction, autovacuum does not handle tableB. Why is it so?

autovacuum does cleaning of changes related to finished transactions. It does nothing if possible dead tuples are assigned to open transactions.

Regards

Pavel

--
Regards,
Michael Holzman

Re: Autovacuum of independent tables

From

Michael Holzman

Date:

08 September 2020, 08:16:04

On Tue, Sep 8, 2020 at 10:46 AM Pavel Stehule wrote:

autovacuum does cleaning of changes related to finished transactions. It does nothing if possible dead tuples are assigned to open transactions.

This is the point.

Autovacuum does not clean dead tuples of closed transactions in tableB while there is an open transaction on tableA.

But the tables have nothing in common. They are handled by separate applications and there are no transactions that touch both tables simultaneously.

Why does autovacuum create an artificial dependency on the tables?

Regards,
Michael Holzman

Re: Autovacuum of independent tables

From

Michael Paquier

Date:

08 September 2020, 08:27:55

On Tue, Sep 08, 2020 at 11:16:04AM +0300, Michael Holzman wrote:
> Autovacuum does not clean dead tuples of closed transactions in tableB
> while there is an open transaction on tableA.
> But the tables have nothing in common. They are handled by separate
> applications and there are no transactions that touch both tables
> simultaneously.
> Why does autovacuum create an artificial dependency on the tables?

This is called MVCC, which applies to a session as a whole.  The point
here is that even if your application knows that only tableA is used
by a given transaction, Postgres cannot know that, as it could be
possible that data from tableB is needed in this same transaction, so
old versions of the rows from tableB matching with the snapshot hold
by this long-running transaction still have to be around.
--
Michael

Attachment

signature.asc

Re: Autovacuum of independent tables

From

Michael Holzman

Date:

08 September 2020, 08:42:44

On Tue, Sep 8, 2020 at 11:28 AM Michael Paquier wrote:

This is called MVCC, which applies to a session as a whole. The point
here is that even if your application knows that only tableA is used
by a given transaction, Postgres cannot know that, as it could be
possible that data from tableB is needed in this same transaction, so
old versions of the rows from tableB matching with the snapshot hold
by this long-running transaction still have to be around.

Yes, I thought so. I just hoped there may be a workaround decoupling the tables.

Thanks.

Regards,
Michael Holzman

Re: Autovacuum of independent tables

From

Pavel Stehule

Date:

08 September 2020, 09:27:44

út 8. 9. 2020 v 10:42 odesílatel Michael Holzman <michaelholzman@gmail.com> napsal:

On Tue, Sep 8, 2020 at 11:28 AM Michael Paquier wrote:

This is called MVCC, which applies to a session as a whole. The point
here is that even if your application knows that only tableA is used
by a given transaction, Postgres cannot know that, as it could be
possible that data from tableB is needed in this same transaction, so
old versions of the rows from tableB matching with the snapshot hold
by this long-running transaction still have to be around.

Yes, I thought so. I just hoped there may be a workaround decoupling the tables.
Thanks.

You can try to reduce length of transactions, if possible.

Regards

Pavel

--
Regards,
Michael Holzman

Re: Autovacuum of independent tables

From

Michael Holzman

Date:

08 September 2020, 10:13:14

On Tue, Sep 8, 2020 at 12:28 PM Pavel Stehule wrote:

You can try to reduce length of transactions, if possible.

This is the problem. A and B were developed for Oracle where SELECT does not open a transaction. We moved them to PG and now we have to very accurately add COMMITs without breaking the flow. It is quite a complex thing. I hoped we can avoid that.

Regards,
Michael Holzman

Re: Autovacuum of independent tables

From

Ravi Krishna

Date:

08 September 2020, 10:52:10

This is the problem. A and B were developed for Oracle where SELECT does not open a transaction. We moved them to PG
and now we have to very accurately add COMMITs without breaking the flow. It is quite a complex thing. I hoped we can
avoid that.

Interesting. Are you telling the Oracle version of the code had no intermittent COMMIT and relied on one
final COMMIT at the end. Even in Oracle developers must have planned for commit since a long running
open transaction can lead to “snapshot too old” error.

Re: Autovacuum of independent tables

From

Michael Holzman

Date:

08 September 2020, 11:48:10

On Tue, Sep 8, 2020 at 1:52 PM Ravi Krishna wrote:

Interesting. Are you telling the Oracle version of the code had no intermittent COMMIT and relied on one
final COMMIT at the end. Even in Oracle developers must have planned for commit since a long running
open transaction can lead to “snapshot too old” error.

Yes, I am saying just that. With one important clarification: there were no transactions as SELECT does not open them and the application does not change anything on that connection.

So, no 'snapshot too old' and no COMMITs.

Regards,
Michael Holzman

Re: Autovacuum of independent tables

From

Ravi Krishna

Date:

08 September 2020, 12:17:36

>>Interesting. Are you telling the Oracle version of the code had no

>>intermittent COMMIT and relied on one final COMMIT at the end. Even

>>in Oracle developers must have planned for commit since a long running

>>open transaction can lead to “snapshot too old” error.

>Yes, I am saying just that. With one important clarification: there were

>no transactions as SELECT does not open them and the application does not

>change anything on that connection. So, no 'snapshot too old' and no COMMITs.

It's been a while since I worked with Oracle as a developer. But my understanding

is that even a read-only transaction, like the one you described above, requires

a point in time consistent image of the database. This would imply that if your

transaction runs for a long time and meanwhile other DML sessions change lot of blocks,

resulting in undo tablespace getting totally turned over, then Oracle can no longer

gurantee PIT consistent view of the database to your session and barf out with

snapshot-too-old error.

I have no way of confirming this and I am writing this based on my limited experience

with oracle. So I may be wrong.

Re: Autovacuum of independent tables

From

Ravi Krishna

Date:

08 September 2020, 12:19:54

This is assuming other sessions change the same block your session is trying to read.

===

It's been a while since I worked with Oracle as a developer. But my understanding

is that even a read-only transaction, like the one you described above, requires

a point in time consistent image of the database. This would imply that if your

transaction runs for a long time and meanwhile other DML sessions change lot of blocks,

resulting in undo tablespace getting totally turned over, then Oracle can no longer

gurantee PIT consistent view of the database to your session and barf out with

snapshot-too-old error.

I have no way of confirming this and I am writing this based on my limited experience

with oracle. So I may be wrong.

Re: Autovacuum of independent tables

From

Magnus Hagander

Date:

08 September 2020, 13:25:39

(Please don't drop the mailinglist from CC, as others are likely interested in the responses)

On Tue, Sep 8, 2020 at 3:06 PM Michael Holzman <michaelholzman@gmail.com> wrote:

On Tue, Sep 8, 2020 at 3:03 PM Magnus Hagander wrote:
A PostgreSQL SELECT does *not* open a transaction past the end of the statement, if it's run independently on a connection.
This sounds like you are using a client on PostgreSQL that uses an "autocommit off" mode, since that's the only case where you'd need to add COMMITs (or ROLLBACKs) to close a transaction after a SELECT.

Yes, this is correct. We do not use autocommit. Everything is controlled explicitly. We run quite complex multi-statement multi-table transactions and cannot work with "autocommit on".

That is not what autocommit means.

Whether you have autocommit on or off, you can *always* control things explicitly. And you can certainly run "multi-statement transactions" in autocommit on -- in fact, it's what most people do since it's the default configuration of the system (and I don't see why multi-table would even be relevant).

Autocommit on/off only controls what happens when you *don't* control things explicitly.

Therefore, this is what we have

> psql
psql (11.2)
Type "help" for help.

pg-11.2 rw => COMMIT;
WARNING: 25P01: there is no transaction in progress
LOCATION: EndTransactionBlock, xact.c:3675
COMMIT
Time: 0.745 ms
pg-11.2 rw => select 2*2;
?column?
----------
4
(1 row)

Time: 0.347 ms
pg-11.2 rw => COMMIT;
COMMIT
Time: 0.525 ms

The first COMMIT (immediately after connect) fails as there is no transaction.
The second one works as even this SELECT opened one. We have a transaction (and a snapshot) when no table is touched!

So just to be clear, here is how PostgreSQL behaves by default:

postgres=# commit;
WARNING: there is no transaction in progress
COMMIT
postgres=# select 2*2;
?column?
----------
4
(1 row)

postgres=# commit;
WARNING: there is no transaction in progress
COMMIT

But yes, if you explicitly ask that a query shall keep a transaction open across multiple statements, by turning off autocommit, it will.

In fact, *PostgreSQL* will always behave that way. The *psql client* will behave differently depending on how you configure it, and the same will of course apply to any other client that you have. In the example above, psql.

You cannot both have a transaction existing and not existing at the same time. You do have to separate the idea of transactions from snapshots though, as they can differ quite a bit depending on isolation levels.

> And how much a running transaction blocks autovacuum is also dependent on what isolation level you're running it in. In the default isolation level, a snapshot is taken for each individual select, so does not block vacuuming past the end of the individual select. Higher isolation levels will.

> We use default isolation mode and we proved that SELECTs block autovacuum. As soon as we added COMMITs after SELECTS in several places (not all as we still have not fixed all the code), autovacuum started working properly in the fixed flows.

As I said yes, a running SELECT will, because of the snapshot. An open transaction will not, past the individual select, because a new snapshot is taken for each SELECT.

If you have an open transaction that runs regular selects but as separate queries then it will not block autovacuum, unless it also does something else.

Of course if it's a big query that runs the whole time it will, but then there would also not be a way to "add commits" into the middle of it, so clearly that's not what's going on here.

//Magnus

Re: Autovacuum of independent tables

From

Michael Holzman

Date:

08 September 2020, 14:00:56

On Tue, Sep 8, 2020 at 4:25 PM Magnus Hagander wrote:

Whether you have autocommit on or off, you can *always* control things explicitly. And you can certainly run "multi-statement transactions" in autocommit on -- in fact, it's what most people do since it's the default configuration of the system (and I don't see why multi-table would even be relevant).

Autocommit on/off only controls what happens when you *don't* control things explicitly.

I know that we can control things explicitly with "autocommit on". But we would need to add "BEGIN" statements to the code which is an even bigger change than adding COMMITs. We considered it and found that the development cost is too high.

It seems I was not clear enough. I do not complain. I have been a PG fan since 2000 when I worked with it for the first time. I just wanted to understand it deeper and, fortunately, find a work around that would simplify our current development.

Thanks to all.

Regards,
Michael Holzman

Re: Autovacuum of independent tables

From

Magnus Hagander

Date:

08 September 2020, 14:17:37

On Tue, Sep 8, 2020 at 4:01 PM Michael Holzman <michaelholzman@gmail.com> wrote:

On Tue, Sep 8, 2020 at 4:25 PM Magnus Hagander wrote:

Whether you have autocommit on or off, you can *always* control things explicitly. And you can certainly run "multi-statement transactions" in autocommit on -- in fact, it's what most people do since it's the default configuration of the system (and I don't see why multi-table would even be relevant).

Autocommit on/off only controls what happens when you *don't* control things explicitly.
I know that we can control things explicitly with "autocommit on". But we would need to add "BEGIN" statements to the code which is an even bigger change than adding COMMITs. We considered it and found that the development cost is too high.

It seems I was not clear enough. I do not complain. I have been a PG fan since 2000 when I worked with it for the first time. I just wanted to understand it deeper and, fortunately, find a work around that would simplify our current development.

Oh sure, but there is clearly *something* going on, so we should try to figure that out. Because a transaction running multiple independent selects with the defaults settings will not actually block autovacuum. So clearly there is something else going on -- something else must be non-default, or it's something that the driver layer does.

To show that, something as simple as the following, with autovacuum logging enabled:

session 1:

CREATE TABLE test AS SELECT * FROM generate_series(1,10000);

session 2:

begin;

SELECT count(*) FROM test;

\watch 1

session 1:

delete from test;

In this case, you will see autovacuum firing just fine, even though there is an open transaction that queries the table test. As you're running you can use a third session to see that session 2 flips between "active" and "idle in transaction". The log output in my case was:

2020-09-08 16:13:12.271 CEST [26753] LOG: automatic vacuum of table "postgres.public.test": index scans: 0
pages: 0 removed, 45 remain, 0 skipped due to pins, 0 skipped frozen
tuples: 56 removed, 0 remain, 0 are dead but not yet removable, oldest xmin: 241585
buffer usage: 112 hits, 4 misses, 5 dirtied
avg read rate: 0.006 MB/s, avg write rate: 0.008 MB/s
system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 5.01 s

It is failing to *truncate* the table, but the general autovacuum is running.

Are you by any chance specifically referring to the truncation step?

However, if you change the session 2 to select from a *different* table, the truncation also works, so I'm guessing that's not it?

//Magnus

Re: Autovacuum of independent tables

From

Tom Lane

Date:

08 September 2020, 14:38:40

Magnus Hagander <magnus@hagander.net> writes:
> Oh sure, but there is clearly *something* going on, so we should try to
> figure that out. Because a transaction running multiple independent selects
> with the defaults settings will not actually block autovacuum.

I don't think the OP is claiming that autovacuum is blocked, only that
it's failing to remove recently-dead rows that he thinks could be removed.

The reason that's not so is that whether or not transaction A *has*
touched table B is irrelevant.  It *could* read table B at any moment,
for all autovacuum knows.  Therefore we cannot remove rows that should
still be visible to A's snapshot.

There are some approximations involved in figuring out which rows are
potentially still visible to someone.  So perhaps this is a situation
where an approximation is being used and tighter analysis would have
shown that indeed a row could be removed.  But we haven't seen any
evidence of that so far.  The basic fact that A's snapshot is limiting
removal of rows from a table it has not touched is not a bug.

            regards, tom lane

Re: Autovacuum of independent tables

From

Magnus Hagander

Date:

08 September 2020, 14:46:56

On Tue, Sep 8, 2020 at 4:38 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Magnus Hagander <magnus@hagander.net> writes:
> Oh sure, but there is clearly *something* going on, so we should try to
> figure that out. Because a transaction running multiple independent selects
> with the defaults settings will not actually block autovacuum.

I don't think the OP is claiming that autovacuum is blocked, only that
it's failing to remove recently-dead rows that he thinks could be removed.

The reason that's not so is that whether or not transaction A *has*
touched table B is irrelevant. It *could* read table B at any moment,
for all autovacuum knows. Therefore we cannot remove rows that should
still be visible to A's snapshot.

There are some approximations involved in figuring out which rows are
potentially still visible to someone. So perhaps this is a situation
where an approximation is being used and tighter analysis would have
shown that indeed a row could be removed. But we haven't seen any
evidence of that so far. The basic fact that A's snapshot is limiting
removal of rows from a table it has not touched is not a bug.

Right. But in the default isolation level, the snapshot of A gets reset between each SELECT, and does not persist to the end of the transaction. So adding COMMIT between each select shouldn't change that part, should it? That is, it's the snapshot age that decides it, not the transaction age.

I feel there is still some piece of information missing there, that could explain the problem better...

//Magnus

Re: Autovacuum of independent tables

From

Ron

Date:

08 September 2020, 14:47:08


On 9/8/20 3:27 AM, Michael Paquier wrote:
> On Tue, Sep 08, 2020 at 11:16:04AM +0300, Michael Holzman wrote:
>> Autovacuum does not clean dead tuples of closed transactions in tableB
>> while there is an open transaction on tableA.
>> But the tables have nothing in common. They are handled by separate
>> applications and there are no transactions that touch both tables
>> simultaneously.
>> Why does autovacuum create an artificial dependency on the tables?
> This is called MVCC, which applies to a session as a whole.  The point
> here is that even if your application knows that only tableA is used
> by a given transaction, Postgres cannot know that, as it could be
> possible that data from tableB is needed in this same transaction, so
> old versions of the rows from tableB matching with the snapshot hold
> by this long-running transaction still have to be around.

Too bad the START TRANSACTION statement doesn't have a RESERVING clause 
where you can enumerate the tables you'll be using.

-- 
Angular momentum makes the world go 'round.

Re: Autovacuum of independent tables

From

Michael Holzman

Date:

08 September 2020, 14:47:26

On Tue, Sep 8, 2020 at 5:38 PM Tom Lane wrote:

Magnus Haganderwrites:
> Oh sure, but there is clearly *something* going on, so we should try to
> figure that out. Because a transaction running multiple independent selects
> with the defaults settings will not actually block autovacuum.

I don't think the OP is claiming that autovacuum is blocked, only that
it's failing to remove recently-dead rows that he thinks could be removed.

Yes, this is exactly what happens.

The reason that's not so is that whether or not transaction A *has*
touched table B is irrelevant. It *could* read table B at any moment,
for all autovacuum knows. Therefore we cannot remove rows that should
still be visible to A's snapshot.

There are some approximations involved in figuring out which rows are
potentially still visible to someone. So perhaps this is a situation
where an approximation is being used and tighter analysis would have
shown that indeed a row could be removed. But we haven't seen any
evidence of that so far. The basic fact that A's snapshot is limiting
removal of rows from a table it has not touched is not a bug.

It's obviously not a bug. I was just surprised when I figured that out. It's also quite complex to explain to my colleagues. Actually, this is the main reason I started this thread: I tried to explain to someone and felt that I miss something.

Regards,
Michael Holzman

Re: Autovacuum of independent tables

From

Michael Holzman

Date:

08 September 2020, 15:00:25

On Tue, Sep 8, 2020 at 5:47 PM Magnus Hagander wrote:

I feel there is still some piece of information missing there, that could explain the problem better...

I gave all the information I have (without real application and table names, of course).

Both applications are C++ demons working with PG via ODBC on RHEL. We use default ODBC settings.

Regards,
Michael Holzman

Re: Autovacuum of independent tables

From

Tom Lane

Date:

08 September 2020, 15:15:58

Magnus Hagander <magnus@hagander.net> writes:
> On Tue, Sep 8, 2020 at 4:38 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The reason that's not so is that whether or not transaction A *has*
>> touched table B is irrelevant.  It *could* read table B at any moment,
>> for all autovacuum knows.  Therefore we cannot remove rows that should
>> still be visible to A's snapshot.

> Right. But in the default isolation level, the snapshot of A gets reset
> between each SELECT, and does not persist to the end of the transaction.

Well, we don't know what isolation level the OP is using.  We also don't
know what PG version he's using.  From memory, it hasn't been that long
since we fixed things so that an idle read-committed transaction
advertises no xmin.  It's also possible that the transaction isn't really
idle between statements (eg, if it's holding open cursors, or the like).

            regards, tom lane

Re: Autovacuum of independent tables

From

Magnus Hagander

Date:

08 September 2020, 15:33:04

On Tue, Sep 8, 2020 at 5:16 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Magnus Hagander <magnus@hagander.net> writes:
> On Tue, Sep 8, 2020 at 4:38 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The reason that's not so is that whether or not transaction A *has*
>> touched table B is irrelevant. It *could* read table B at any moment,
>> for all autovacuum knows. Therefore we cannot remove rows that should
>> still be visible to A's snapshot.

> Right. But in the default isolation level, the snapshot of A gets reset
> between each SELECT, and does not persist to the end of the transaction.

Well, we don't know what isolation level the OP is using. We also don't

Per the thread, he's using the default, which should be read committed.

know what PG version he's using. From memory, it hasn't been that long

Per his session list, 11.2.

since we fixed things so that an idle read-committed transaction
advertises no xmin. It's also possible that the transaction isn't really
idle between statements (eg, if it's holding open cursors, or the like).

Oh, now *cursors* is definitely something I didn't think of. And especially in the context of ODBC, I wonder if it might be creating cursors transparently, and that this somehow causes the problems.

Michael, do you know if that might be the case? Or try enabling log_statements to check if it is?

Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

Re: Autovacuum of independent tables

From

Michael Holzman

Date:

08 September 2020, 15:34:09

On Tue, Sep 8, 2020 at 6:15 PM Tom Lane wrote:

Well, we don't know what isolation level the OP is using.

We use the default. We do not explicitly set the isolation level.

We also don't know what PG version he's using.

it's 12.1

From memory, it hasn't been that long
since we fixed things so that an idle read-committed transaction
advertises no xmin. It's also possible that the transaction isn't really
idle between statements (eg, if it's holding open cursors, or the like).

There are no open cursors.

Regards,
Michael Holzman

Re: Autovacuum of independent tables

From

Michael Holzman

Date:

08 September 2020, 15:38:34

On Tue, Sep 8, 2020 at 6:33 PM Magnus Hagander wrote:

Per his session list, 11.2.

11.2 is on my personal testing machine. The real tests run with 12.1

Oh, now *cursors* is definitely something I didn't think of. And especially in the context of ODBC, I wonder if it might be creating cursors transparently, and that this somehow causes the problems.

Michael, do you know if that might be the case? Or try enabling log_statements to check if it is?

This is very interesting. I'll try to check it. I'll post here a reply if we do have open cursors. Thanks.

Regards,
Michael Holzman

Re: Autovacuum of independent tables

From

Stephen Frost

Date:

09 September 2020, 13:59:28

Greetings,

* Michael Holzman (michaelholzman@gmail.com) wrote:
> I have two applications A and B. A runs SELECT statements only and only on
> tableA. B actively updates tableB, A never looks into tableB. B has nothing
> to do with tableA.

In an ideal world, such distinct applications would probably do better
to live in independent PG clusters, which would address this issue with
VACUUM'ing and also allow you to do file-level backup/restore of each
independently, scale physical replicas independently, as well as manage
roles to be specific for each without seeing others, etc.

Thanks,

Stephen

Attachment

signature.asc