Re: another autovacuum scheduling thread - Mailing list pgsql-hackers

From Sami Imseih
Subject Re: another autovacuum scheduling thread
Date
Msg-id CAA5RZ0sw+9rEaW9taNpRZWvuLYMjRa9iibneGfB2ftNSUHT0Ww@mail.gmail.com
Whole thread Raw
In response to Re: another autovacuum scheduling thread  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: another autovacuum scheduling thread
List pgsql-hackers
Thanks for the ideas on improving the test!

I am still trying to see how useful this type of testing is,
but I will share what I have done.

> I wonder if it would be more realistic to throttle the work simulation
> to a certain speed with pgbench -R rather than having it go flat out.

good point

> > If we logged the score, we could do the "unpatched" test with the
> > patched code, just with commenting out the
> > list_sort(tables_to_process, TableToProcessComparator); It'd then be
> > interesting to zero the log_auto*_min_duration settings and review the
> > order differences and how high the scores got. Would the average score
> > be higher or lower with patched version?

I agree. I attached a patch on top of v7 that implements a debug GUC
to enable or disable sorting for testing purposes.

> I'm not yet sure how meaningful it is, but I tried adding the
> following to recheck_relation_needs_vacanalyze():
>
> elog(LOG, "Performing autovacuum of table \"%s\" with score = %f",
> get_rel_name(relid), score);

The same attached patch also implements this log.

I also spent more time working on the test script. I cleaned it up and
combined it into a single script. I added a few things:

- Ability to run with or without the batch workload.
- OLTP tables are no longer the same size; they are created with
different row counts using a minimum and maximum row count and a
multiplier for scaling the next table.
- A background collector for pg_stat_all_tables on relevant tables,
stored in relstats_monitor.log.
- Logs are saved after the run for further analysis, such as examining
the scores.

Also attached is analysis for a run with 16 OLTP tables and 3 batch tables.
It shows that with sorting enabled or disabled, the vacuum/analyze activity
does not show any major differences. OLTP had very similar DML and
autovacuum/autoanalyze activity. A few points to highlight:

1/ In the sorted run, we had an equal number of autovacuums/autoanalyze
on the smaller OLTP tables, as if every eligible table needed both
autovacuum and autoanalyze. The unsorted run was less consistent on
the smaller tables. I observed this on several runs. I don't think it's a big
deal, but interesting nonetheless.

2/ Batch tables in the sorted run had less autovacuum time (1,257,821 vs
962,794 ms), but very similar autovacuum counts.

3/ OLTP tables, on the other hand, had more autovacuum time in the
sorted run (3,590,964 vs 3,852,460 ms), but I do not see much difference
in autovacuum/autoanalyze counts.

Other tests I plan on running:
- batch updates/deletes, since the current batch option only tests append-only
tables.
- OLTP only test.

Also, I am thinking about another sorting strategy based on average
autovacuum/autoanalyze time per table. The idea is to sort ascending by
the greater of the two averages, so workers process quicker tables first
instead of all workers potentially getting hung on the slowest tables.
We can calculate the average now that v18 includes total_autovacuum_time
and total_autoanalyze time.

The way I see it, regardless of prioritization, a few large tables may
still monopolize autovacuum workers. But at least this way, the quick tables
get a chance to get processed first. Will this be an idea worth testing out?

--
Sami Imseih
Amazon Web Services (AWS)

Attachment

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: [Patch] Windows relation extension failure at 2GB and 4GB
Next
From: Michael Paquier
Date:
Subject: Re: [Patch] Windows relation extension failure at 2GB and 4GB