Home > mailing lists

Question about VACUUM behavior with sub-transactions in stored procedures - Mailing list pgsql-hackers

From	Кириллов Вячеслав
Subject	Question about VACUUM behavior with sub-transactions in stored procedures
Date	October 21, 2024 15:00:41
Msg-id	1729512039118.44460@diasoft.ru Whole thread Raw
Responses	Re: Question about VACUUM behavior with sub-transactions in stored procedures
List	pgsql-hackers

Tree view

Hi hackers,,

I have a question regarding the behavior of the auto VACUUM in PostgreSQL in the context of using stored procedures with sub-transactions.

As I understand it, the parameters that control the initiation of VACUUM are set in the configuration file, such as autovacuum_vacuum_threshold, autovacuum_vacuum_scale_factor, and are stored in the system tables pg_stat_user_tables or pg_class (please correct me if I'm wrong). These system tables are updated after each completed transaction, and VACUUM analyzes them to determine whether to clean up dead rows, depending on the configured thresholds.

Here is the scenario: we have several stored procedures that modify or update table data. These procedures use sub-transactions, which are committed via COMMIT. However, from my understanding, the system table, which VACUUM checks, is not updated until the main (outermost) transaction completes. This means that during the execution of the procedures, a significant number of dead rows may accumulate, and only after the final COMMIT of the main transaction do these dead rows become visible for VACUUM.

As a result, there could be a sharp spike in CPU usage when VACUUM runs after the completion of the procedures, as it begins to clean up a large number of accumulated dead rows.

I would like to know if this behavior is expected and correct? Or could there be a potential issue or bug in this scenario?

To illustrate the issue, here’s an example:

CREATE TABLE bloat

(

id integer generated always as identity,

d timestamptz

);

CREATE OR REPLACE PROCEDURE update_multiple_bloat()

LANGUAGE plpgsql

AS $$

DECLARE

row_to_update RECORD;

BEGIN

FOR row_to_update IN SELECT * FROM bloat

LOOP

UPDATE bloat SET d = CURRENT_TIMESTAMP WHERE d = row_to_update.d;

COMMIT;

END LOOP;

END;

$$;

CREATE OR REPLACE PROCEDURE insert_multiple_into_bloat(num_records integer)

LANGUAGE plpgsql

AS $$

DECLARE

i integer := 1;

BEGIN

LOOP

EXIT WHEN i > num_records;

INSERT INTO bloat (d) VALUES (CURRENT_TIMESTAMP);

i := i + 1;

COMMIT;

END LOOP;

END;

$$;

DO $$

DECLARE

row_data RECORD;

counter INT := 0;

BEGIN

INSERT INTO bloat (d) VALUES (CURRENT_TIMESTAMP);

COMMIT;

END;

BEGIN

call insert_multiple_into_bloat(100);

END;

BEGIN

call update_multiple_bloat();

END;

END $$;

Thank you in advance for your help!

With Regards,
Vyacheslav Kirillov!

pgsql-hackers by date:

From: Amit Kapila
Date: 21 October 2024, 14:19:39
Subject: Re: Pgoutput not capturing the generated columns

From: Fujii Masao
Date: 21 October 2024, 15:39:39
Subject: Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row

Question about VACUUM behavior with sub-transactions in stored procedures - Mailing list pgsql-hackers

Previous

Next