Autovacuum versus rolled-back transactions - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Autovacuum versus rolled-back transactions |
Date | |
Msg-id | 28005.1180135238@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: Autovacuum versus rolled-back transactions
Re: Autovacuum versus rolled-back transactions Re: Autovacuum versus rolled-back transactions |
List | pgsql-hackers |
The pgstats subsystem does not correctly account for the effects of failed transactions. Note the live/dead tuple counts in this example: regression=# create table foo (f1 int); CREATE TABLE regression=# insert into foo select x from generate_series(1,1000) x; INSERT 0 1000 -- wait a second for stats to catch up regression=# select * from pg_stat_all_tables where relname = 'foo';relid | schemaname | relname | seq_scan | seq_tup_read| idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del | n_live_tup | n_dead_tup | last_vacuum | last_autovacuum| last_analyze | last_autoanalyze --------+------------+---------+----------+--------------+----------+---------------+-----------+-----------+-----------+------------+------------+-------------+-----------------+--------------+------------------496849 |public | foo | 0 | 0 | | | 1000 | 0 | 0 | 1000| 0 | | | | (1 row) regression=# begin; BEGIN regression=# insert into foo select x from generate_series(1,1000) x; INSERT 0 1000 regression=# rollback; ROLLBACK -- wait a second for stats to catch up regression=# select * from pg_stat_all_tables where relname = 'foo';relid | schemaname | relname | seq_scan | seq_tup_read| idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del | n_live_tup | n_dead_tup | last_vacuum | last_autovacuum| last_analyze | last_autoanalyze --------+------------+---------+----------+--------------+----------+---------------+-----------+-----------+-----------+------------+------------+-------------+-----------------+--------------+------------------496849 |public | foo | 0 | 0 | | | 2000 | 0 | 0 | 2000| 0 | | | | (1 row) This means that a table could easily be full of dead tuples from failed transactions, and yet autovacuum won't do a thing because it doesn't know there are any. Perhaps this explains some of the reports we've heard of tables bloating despite having autovac on. It seems to me this is a "must fix" if we expect people to rely on autovacuum for real in 8.3. I think it's fairly obvious how n_live_tup and n_dead_tup ought to change in response to a failed xact, but maybe not so obvious for the other counters. I suggest that the scan/fetch counters (seq_scan, seq_tup_read, idx_scan, idx_tup_fetch) as well as all the block I/O counters should increment the same for committed and failed xacts, since they are meant to count work done regardless of whether the work was in vain. I am much less sure how we want n_tup_ins, n_tup_upd, n_tup_del to act though. Should they be advanced "as normal" by a failed xact? That's what the code is doing now, and if you think they are counters for work done, it's not so unreasonable. It may boil down to whether we would like the identityn_live_tup = n_tup_ins - n_tup_del to continue to hold, or the similar one for n_dead_tup. The problem basically is that pgstats is computing n_live_tup and n_dead_tup using those identities rather than by tracking what really happens. I don't think we can have those identities if failed xacts update the counts "normally". Is it worth having separate counters for the numbers of failed inserts/updates? (Failed deletes perhaps need not be counted, since they change nothing.) Or we could change the backends so that the reported n_tup_ins/del/upd are made to still produce the right live/dead tup counts according to the identities, but then those counts would not reflect work done. Another alternative is for transactions to tally the number of live and dead tuples they create, with understanding of rollbacks, and send those to the stats collector independently of the action counters. I don't think I want to add separate failed-insert/update counters, because that will bloat the stats reporting file, which is uncomfortably large already when you have lots of tables. The separate-tally method would avoid that, at the price of more stats UDP traffic. I'm kind of leaning to the separate-tally method and abandoning the assumption that the identities hold. I'm not wedded to the idea though. Any thoughts? regards, tom lane
pgsql-hackers by date: