Thread: A few new options for vacuumdb

A few new options for vacuumdb

From

"Bossart, Nathan"

Date:

19 December 2018, 20:50:10

Hi hackers,

I've attached a few patches to add some options to vacuumdb that might
be useful.  Specifically, these patches add the --skip-locked,
--min-xid-age, --min-mxid-age, and --min-relation-size options.

If an option is specified for a server version that is not supported,
the option is silently ignored.  For example, SKIP_LOCKED was only
added to VACUUM and ANALYZE for v12.  Alternatively, I think we could
fail in vacuum_one_database() if an unsupported option is specified.
Some of these options will work on all currently supported versions,
so I am curious what others think about skipping some of these version
checks altogether.

In this set of patches, I've disallowed using --min-xid-age,
--min-mxid-age, and --min-relation-size in conjunction with the
--table option.  IMO this combination of options is prone to
confusion.  For example:

        vacuumdb mydb --table mytable --min-relation-size 1024

It does not seem clear whether the user wants us to process mytable
only if it is at least 1 GB, or if we should process mytable in
addition to any other relations over 1 GB.  Either way, I think trying
to support these combinations of options adds more complexity than it
is worth.

0001 is a minor fix that is somewhat separate from these new options,
although the new options will make the edge case it aims to fix much
easier to reach.  When the catalogs are queried in parallel mode to
get the list of tables to process, we currently assume that at least
one table will be returned.  If no tables are found, the tables
variable will stay as NULL, which leads to database-wide VACUUM or
ANALYZE commands.  Since there are currently no user-configurable
options available for this catalog query, this case is likely
exceptionally rare.  However, with the new options, it is much easier
to inadvertently filter out all relations.

I will be adding this work to the next commitfest.

Nathan

Attachment

Re: A few new options for vacuumdb

From

Michael Paquier

Date:

20 December 2018, 02:05:22

On Wed, Dec 19, 2018 at 08:50:10PM +0000, Bossart, Nathan wrote:
> If an option is specified for a server version that is not supported,
> the option is silently ignored.  For example, SKIP_LOCKED was only
> added to VACUUM and ANALYZE for v12.  Alternatively, I think we could
> fail in vacuum_one_database() if an unsupported option is specified.
> Some of these options will work on all currently supported versions,
> so I am curious what others think about skipping some of these version
> checks altogether.

prepare_vacuum_command() already handles that by ignoring silently
unsupported options (see the case of SKIP_LOCKED).  So why not doing the
same?

> It does not seem clear whether the user wants us to process mytable
> only if it is at least 1 GB, or if we should process mytable in
> addition to any other relations over 1 GB.  Either way, I think trying
> to support these combinations of options adds more complexity than it
> is worth.

It seems to me that a combination of both options means that the listed
table should be processed only if its minimum size is 1GB.  If multiple
tables are specified with --table, then only those reaching 1GB would be
processed.  So this restriction can go away.  The same applies for the
proposed --min-xid-age and --min-mxid-age.

+       <para>
+        Only execute the vacuum or analyze commands on tables with a multixact
+        ID age of at least <replaceable
class="parameter">mxid_age</replaceable>.
+       </para>
Adding a link to explain the multixact business may be helpful, say
vacuum-for-multixact-wraparound.  Same comment for XID.

> 0001 is a minor fix that is somewhat separate from these new options,
> although the new options will make the edge case it aims to fix much
> easier to reach.  When the catalogs are queried in parallel mode to
> get the list of tables to process, we currently assume that at least
> one table will be returned.  If no tables are found, the tables
> variable will stay as NULL, which leads to database-wide VACUUM or
> ANALYZE commands.  Since there are currently no user-configurable
> options available for this catalog query, this case is likely
> exceptionally rare.  However, with the new options, it is much easier
> to inadvertently filter out all relations.

Agreed.  No need to visibly bother about that in back-branches.

There is an argument about also adding DISABLE_PAGE_SKIPPING.
--
Michael

Attachment

signature.asc

Re: A few new options for vacuumdb

From

"Bossart, Nathan"

Date:

20 December 2018, 16:48:11

Hi Michael,

Thanks for taking a look.

On 12/19/18, 8:05 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
> On Wed, Dec 19, 2018 at 08:50:10PM +0000, Bossart, Nathan wrote:
>> If an option is specified for a server version that is not supported,
>> the option is silently ignored.  For example, SKIP_LOCKED was only
>> added to VACUUM and ANALYZE for v12.  Alternatively, I think we could
>> fail in vacuum_one_database() if an unsupported option is specified.
>> Some of these options will work on all currently supported versions,
>> so I am curious what others think about skipping some of these version
>> checks altogether.
>
> prepare_vacuum_command() already handles that by ignoring silently
> unsupported options (see the case of SKIP_LOCKED).  So why not doing the
> same?

The --skip-locked option in vacuumdb is part of 0002, so I don't think
there's much precedent here.  We do currently fall back to the
unparenthesized syntax for VACUUM for versions before 9.0, so there is
some version handling already.  What do you think about removing
version checks for unsupported major versions?  If we went that route,
these new patches could add version checks only for options that were
added in a supported major version (e.g. mxid_age() was added in 9.5).
Either way, we'll still have to decide whether to fail or to silently
skip the option if you do something like specify --min-mxid-age for a
9.4 server.

>> It does not seem clear whether the user wants us to process mytable
>> only if it is at least 1 GB, or if we should process mytable in
>> addition to any other relations over 1 GB.  Either way, I think trying
>> to support these combinations of options adds more complexity than it
>> is worth.
>
> It seems to me that a combination of both options means that the listed
> table should be processed only if its minimum size is 1GB.  If multiple
> tables are specified with --table, then only those reaching 1GB would be
> processed.  So this restriction can go away.  The same applies for the
> proposed --min-xid-age and --min-mxid-age.

Okay.  This should be pretty easy to do using individual catalog
lookups before we call prepare_vacuum_command().  Alternatively, I
could add a clause for each specified table in the existing query in
vacuum_one_database().  At that point, it may be cleaner to just use
that catalog query in all cases instead of leaving paths where tables
can remain NULL.

> +       <para>
> +        Only execute the vacuum or analyze commands on tables with a multixact
> +        ID age of at least <replaceable
> class="parameter">mxid_age</replaceable>.
> +       </para>
> Adding a link to explain the multixact business may be helpful, say
> vacuum-for-multixact-wraparound.  Same comment for XID.

Good idea.

> There is an argument about also adding DISABLE_PAGE_SKIPPING.

I'll add this in the next set of patches.  I need to add the
parenthesized syntax and --skip-locked for ANALYZE in
prepare_vacuum_command(), too.

Nathan

Re: A few new options for vacuumdb

From

Michael Paquier

Date:

21 December 2018, 01:05:45

On Thu, Dec 20, 2018 at 04:48:11PM +0000, Bossart, Nathan wrote:
> The --skip-locked option in vacuumdb is part of 0002, so I don't think
> there's much precedent here.

It looks like I was not looking at the master branch here ;)

> We do currently fall back to the
> unparenthesized syntax for VACUUM for versions before 9.0, so there is
> some version handling already.  What do you think about removing
> version checks for unsupported major versions?

I am not exactly sure down to which version this needs to be supported
and what's the policy about that (if others have an opinion to share,
please feel free) but surely it does not hurt to keep those code paths
either from a maintenance point of view.

> If we went that route, these new patches could add version checks only
> for options that were added in a supported major version (e.g.
> mxid_age() was added in 9.5). Either way, we'll still have to decide
> whether to fail or to silently skip the option if you do something
> like specify --min-mxid-age for a 9.4 server.

There are downsides and upsides for each approach.  Reporting a failure
makes it clear that an option is not available with a clear error
message, however it complicates a bit the error handling for parallel
runs.  And vacuum_one_database should be the code path doing as that's
where all the connections are taken even when all databases are
processed.

Ignoring the option, as your patch set does, has the disadvantage to
potentially confuse users, say we may see complains like "why is VACUUM
locking even if I specified --skip-locked?".  Still this keeps the code
minimalistic and simple.

Just passing down the options and seeing a failure in the query
generated is also a confusing experience I guess for not-so-advanced
users even if it may simplify the error handling.

Issuing a proper error feels more natural to me I think, as users can
react on that properly.  Opinions from others are welcome.
--
Michael

Attachment

signature.asc

Re: A few new options for vacuumdb

From

Robert Haas

Date:

21 December 2018, 16:49:01

On Wed, Dec 19, 2018 at 9:05 PM Michael Paquier <michael@paquier.xyz> wrote:
> > It does not seem clear whether the user wants us to process mytable
> > only if it is at least 1 GB, or if we should process mytable in
> > addition to any other relations over 1 GB.  Either way, I think trying
> > to support these combinations of options adds more complexity than it
> > is worth.
>
> It seems to me that a combination of both options means that the listed
> table should be processed only if its minimum size is 1GB.  If multiple
> tables are specified with --table, then only those reaching 1GB would be
> processed.

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: A few new options for vacuumdb

From

Robert Haas

Date:

21 December 2018, 16:50:33

On Thu, Dec 20, 2018 at 11:48 AM Bossart, Nathan <bossartn@amazon.com> wrote:
> Either way, we'll still have to decide whether to fail or to silently
> skip the option if you do something like specify --min-mxid-age for a
> 9.4 server.

+1 for fail.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: A few new options for vacuumdb

From

"Bossart, Nathan"

Date:

21 December 2018, 17:12:07

On 12/21/18, 10:51 AM, "Robert Haas" <robertmhaas@gmail.com> wrote:
> On Thu, Dec 20, 2018 at 11:48 AM Bossart, Nathan <bossartn@amazon.com> wrote:
>> Either way, we'll still have to decide whether to fail or to silently
>> skip the option if you do something like specify --min-mxid-age for a
>> 9.4 server.
>
> +1 for fail.

Sounds good.  I'll keep all the version checks and will fail for
unsupported options in the next patch set.

Nathan

Re: A few new options for vacuumdb

From

"Bossart, Nathan"

Date:

04 January 2019, 23:49:46

On 12/21/18, 11:14 AM, "Bossart, Nathan" <bossartn@amazon.com> wrote:
> On 12/21/18, 10:51 AM, "Robert Haas" <robertmhaas@gmail.com> wrote:
>> On Thu, Dec 20, 2018 at 11:48 AM Bossart, Nathan <bossartn@amazon.com> wrote:
>>> Either way, we'll still have to decide whether to fail or to silently
>>> skip the option if you do something like specify --min-mxid-age for a
>>> 9.4 server.
>>
>> +1 for fail.
>
> Sounds good.  I'll keep all the version checks and will fail for
> unsupported options in the next patch set.

Here's an updated set of patches with the following changes:

 - 0002 adds the parenthesized syntax for ANALYZE.
 - 0003 adds DISABLE_PAGE_SKIPPING for VACUUM.
 - 0003 also ensures SKIP_LOCKED is applied for --analyze-only.
 - 0004 alters vacuumdb to always use the catalog query to discover
   the list of tables to process.
 - 0003, 0005, and 0006 now fail in vacuum_one_database() if a
   specified option is not available on the server version.
 - If tables are listed along with --min-xid-age, --min-mxid-age, or
   --min-relation-size, each table is processed only if it satisfies
   the provided options.

0004 introduces a slight change to existing behavior.  Currently, if
you specify a missing table, vacuumdb will process each table until
it reaches the nonexistent one, at which point it will fail.  After
0004 is applied, vacuumdb will fail during the catalog query, and no
tables will be processed.  I considered avoiding this change in
behavior by doing an additional catalog lookup for each specified
table to see whether it satisfies --min-xid-age, etc.  However, this
introduced quite a bit more complexity and increased the number of
catalog queries needed.

Nathan

Attachment

Re: A few new options for vacuumdb

From

Michael Paquier

Date:

05 January 2019, 01:19:41

On Fri, Jan 04, 2019 at 11:49:46PM +0000, Bossart, Nathan wrote:
> 0004 introduces a slight change to existing behavior.  Currently, if
> you specify a missing table, vacuumdb will process each table until
> it reaches the nonexistent one, at which point it will fail.  After
> 0004 is applied, vacuumdb will fail during the catalog query, and no
> tables will be processed.

I have not looked at the patch set in details, but that would make
vacuumdb more consistent with the way VACUUM works with multiple
relations, which sounds like a good thing.
--
Michael

Attachment

signature.asc

Re: A few new options for vacuumdb

From

Michael Paquier

Date:

07 January 2019, 07:11:46

On Fri, Jan 04, 2019 at 11:49:46PM +0000, Bossart, Nathan wrote:
> Here's an updated set of patches with the following changes:
>
>  - 0002 adds the parenthesized syntax for ANALYZE.
>  - 0003 adds DISABLE_PAGE_SKIPPING for VACUUM.
>  - 0003 also ensures SKIP_LOCKED is applied for --analyze-only.
>  - 0004 alters vacuumdb to always use the catalog query to discover
>    the list of tables to process.
>  - 0003, 0005, and 0006 now fail in vacuum_one_database() if a
>    specified option is not available on the server version.
>  - If tables are listed along with --min-xid-age, --min-mxid-age, or
>    --min-relation-size, each table is processed only if it satisfies
>    the provided options.

I have been looking at the patch set, and 0001 can actually happen
only once 0005 is applied because this modifies the query doing on
HEAD a full scan of pg_class which would include at least catalog
tables so it can never be empty.  For this reason it seems to me that
0001 and 0004 should be merged together as common refactoring, making
sure on the way that all relations exist before processing anything.
As 0005 and 0006 change similar portions of the code, we could also
merge these together in a second patch introducing the new options.
Keeping 0001, 0004, 0005 and 0006 separated eases the review, but I
would likely merge things when they make sense together to reduce
diff chunks.

0002 removes some carefully-written query generation, so as it is
never possible to generate something like ANALYZE FULL.  HEAD splits
ANALYZE and VACUUM clearly, but 0002 merges them together so as
careless coding in vacuumdb.c makes it easier to generate command
strings which would fail in the backend relying on the option checks
to make sure that for example combining --full and --analyze-only
never happens.  Introducing 0002 is mainly here for 0003, so let's
merge them together.

> 0004 introduces a slight change to existing behavior.  Currently, if
> you specify a missing table, vacuumdb will process each table until
> it reaches the nonexistent one, at which point it will fail.  After
> 0004 is applied, vacuumdb will fail during the catalog query, and no
> tables will be processed.  I considered avoiding this change in
> behavior by doing an additional catalog lookup for each specified
> table to see whether it satisfies --min-xid-age, etc.  However, this
> introduced quite a bit more complexity and increased the number of
> catalog queries needed.

Simplicity is always welcome, knowing that tables are missing before
doing any operations is more consistent with plain VACUUM/ANALYZE.

From patch 0004:
+       executeCommand(conn, "RESET search_path;", progname, echo);
+       res = executeQuery(conn, catalog_query.data, progname, echo);
+       termPQExpBuffer(&catalog_query);
+       PQclear(executeQuery(conn, ALWAYS_SECURE_SEARCH_PATH_SQL,
+                               progname, echo));
We should really avoid that.  There are other things like casts which
tend to be forgotten, and if the catalog lookup query gets more
complicated, I feel that this would again be forgotten, reintroducing
the issues that ALWAYS_SECURE_SEARCH_PATH_SQL is here to fix.

I have put my hands into what you sent, and worked on putting
0002/0003 first into shape, finishing with the attached.  I have
simplified the query construction a bit, making it IMO easier to read
and to add new options, with comments documenting what's supported
across versions.  I have also added a regression test for
--analyze-only --skip-locked.  Then I have done some edit of the docs.
What do you think for this portion?
--
Michael

Attachment

Re: A few new options for vacuumdb

From

Masahiko Sawada

Date:

07 January 2019, 09:10:21

On Sat, Jan 5, 2019 at 8:50 AM Bossart, Nathan <bossartn@amazon.com> wrote:
>
> On 12/21/18, 11:14 AM, "Bossart, Nathan" <bossartn@amazon.com> wrote:
> > On 12/21/18, 10:51 AM, "Robert Haas" <robertmhaas@gmail.com> wrote:
> >> On Thu, Dec 20, 2018 at 11:48 AM Bossart, Nathan <bossartn@amazon.com> wrote:
> >>> Either way, we'll still have to decide whether to fail or to silently
> >>> skip the option if you do something like specify --min-mxid-age for a
> >>> 9.4 server.
> >>
> >> +1 for fail.
> >
> > Sounds good.  I'll keep all the version checks and will fail for
> > unsupported options in the next patch set.
>
> Here's an updated set of patches with the following changes:
>
>  - 0002 adds the parenthesized syntax for ANALYZE.
>  - 0003 adds DISABLE_PAGE_SKIPPING for VACUUM.
>  - 0003 also ensures SKIP_LOCKED is applied for --analyze-only.
>  - 0004 alters vacuumdb to always use the catalog query to discover
>    the list of tables to process.
>  - 0003, 0005, and 0006 now fail in vacuum_one_database() if a
>    specified option is not available on the server version.
>  - If tables are listed along with --min-xid-age, --min-mxid-age, or
>    --min-relation-size, each table is processed only if it satisfies
>    the provided options.
>
> 0004 introduces a slight change to existing behavior.  Currently, if
> you specify a missing table, vacuumdb will process each table until
> it reaches the nonexistent one, at which point it will fail.  After
> 0004 is applied, vacuumdb will fail during the catalog query, and no
> tables will be processed.  I considered avoiding this change in
> behavior by doing an additional catalog lookup for each specified
> table to see whether it satisfies --min-xid-age, etc.  However, this
> introduced quite a bit more complexity and increased the number of
> catalog queries needed.
>
> Nathan
>

0002 and 0003 are merged and posted by Michael-san and it looks good
to me, so I've looked at the 0001, 0004, 0005 and 0006 patches. Here
is a few review comments.

-----
Even if all tables are filtered by --min-relation-size, min-mxid-age
or min-xid-age the following message appeared.

$ vacuumdb --verbose --min-relation-size 1000000 postgres
vacuumdb: vacuuming database "postgres"

Since no tables are processed in this case isn't is better not to
output this message by moving the message after checking if ntup == 0?

-----
You use pg_total_relation_size() to check the relation size but it
might be better to use pg_relation_size() instead. The
pg_total_relation_size() includes the all index size but I think it's
common based on my experience that we check only the table size
(including toast table) to do vacuum it or not.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: A few new options for vacuumdb

From

Michael Paquier

Date:

08 January 2019, 02:03:00

On Mon, Jan 07, 2019 at 06:10:21PM +0900, Masahiko Sawada wrote:
> 0002 and 0003 are merged and posted by Michael-san and it looks good
> to me, so I've looked at the 0001, 0004, 0005 and 0006 patches. Here
> is a few review comments.

I have done another round on 0002/0003 (PQfinish was lacking after
error-ing on incompatible options) and committed the thing.

> Even if all tables are filtered by --min-relation-size, min-mxid-age
> or min-xid-age the following message appeared.
>
> $ vacuumdb --verbose --min-relation-size 1000000 postgres
> vacuumdb: vacuuming database "postgres"
>
> Since no tables are processed in this case isn't is better not to
> output this message by moving the message after checking if ntup ==
> 0?

Hmm.  My take on this one is that we attempt to vacuum relations on
this database, but this may finish by doing nothing.  Printing this
message is still important to let the user know that an attempt was
done so I would keep the message.

> You use pg_total_relation_size() to check the relation size but it
> might be better to use pg_relation_size() instead. The
> pg_total_relation_size() includes the all index size but I think it's
> common based on my experience that we check only the table size
> (including toast table) to do vacuum it or not.

Yes, I am also not completely sure if the way things are defined in
the patch are completely what we are looking for.  Still, I have seen
as well people complain about the total amount of space a table is
physically taking on disk, including its indexes.  So from the user
experience the direction taken by the patch makes sense, as much as
does the argument you do.
--
Michael

Attachment

signature.asc

Re: A few new options for vacuumdb

From

"Bossart, Nathan"

Date:

08 January 2019, 18:46:11

On 1/7/19, 1:12 AM, "Michael Paquier" <michael@paquier.xyz> wrote:
> I have been looking at the patch set, and 0001 can actually happen
> only once 0005 is applied because this modifies the query doing on
> HEAD a full scan of pg_class which would include at least catalog
> tables so it can never be empty.  For this reason it seems to me that
> 0001 and 0004 should be merged together as common refactoring, making
> sure on the way that all relations exist before processing anything.
> As 0005 and 0006 change similar portions of the code, we could also
> merge these together in a second patch introducing the new options.
> Keeping 0001, 0004, 0005 and 0006 separated eases the review, but I
> would likely merge things when they make sense together to reduce
> diff chunks.

Thanks for reviewing.  Sure, I can merge these together so that it's
just 2 patches.

> 0002 removes some carefully-written query generation, so as it is
> never possible to generate something like ANALYZE FULL.  HEAD splits
> ANALYZE and VACUUM clearly, but 0002 merges them together so as
> careless coding in vacuumdb.c makes it easier to generate command
> strings which would fail in the backend relying on the option checks
> to make sure that for example combining --full and --analyze-only
> never happens.  Introducing 0002 is mainly here for 0003, so let's
> merge them together.

Makes sense.  I was trying to simplify this code a bit, but I agree
with your point about relying on the option checks.

> From patch 0004:
> +       executeCommand(conn, "RESET search_path;", progname, echo);
> +       res = executeQuery(conn, catalog_query.data, progname, echo);
> +       termPQExpBuffer(&catalog_query);
> +       PQclear(executeQuery(conn, ALWAYS_SECURE_SEARCH_PATH_SQL,
> +                               progname, echo));
> We should really avoid that.  There are other things like casts which
> tend to be forgotten, and if the catalog lookup query gets more
> complicated, I feel that this would again be forgotten, reintroducing
> the issues that ALWAYS_SECURE_SEARCH_PATH_SQL is here to fix.

This was done in order to maintain the current behavior that
appendQualifiedRelation() gives us.  I found that skipping the
search_path handling here forced us to specify the schema in the
argument for --table in most cases.  At the very least, I could add a
comment here to highlight the importance of fully qualifying
everything in the catalog query.  What do you think?

> I have put my hands into what you sent, and worked on putting
> 0002/0003 first into shape, finishing with the attached.  I have
> simplified the query construction a bit, making it IMO easier to read
> and to add new options, with comments documenting what's supported
> across versions.  I have also added a regression test for
> --analyze-only --skip-locked.  Then I have done some edit of the docs.
> What do you think for this portion?

Looks good to me, except for one small thing in the documentation:

+       <para>
+        Disable all page-skipping behavior during processing based on
+        the visibility map, similarly to the option
+        <literal>DISABLE_PAGE_SKIPPING</literal> for <command>VACUUM</command>.
+       </para>

I think the "similarly to the option" part is slightly misleading.
It's not just similar, it _is_ using that option in the generated
commands.  Perhaps we could point to the VACUUM documentation for more
information about this one.

On 1/7/19, 8:03 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
> On Mon, Jan 07, 2019 at 06:10:21PM +0900, Masahiko Sawada wrote:
>> 0002 and 0003 are merged and posted by Michael-san and it looks good
>> to me, so I've looked at the 0001, 0004, 0005 and 0006 patches. Here
>> is a few review comments.
>
> I have done another round on 0002/0003 (PQfinish was lacking after
> error-ing on incompatible options) and committed the thing.

Thanks!

>> Even if all tables are filtered by --min-relation-size, min-mxid-age
>> or min-xid-age the following message appeared.
>> 
>> $ vacuumdb --verbose --min-relation-size 1000000 postgres
>> vacuumdb: vacuuming database "postgres"
>> 
>> Since no tables are processed in this case isn't is better not to
>> output this message by moving the message after checking if ntup ==
>> 0?
>
> Hmm.  My take on this one is that we attempt to vacuum relations on
> this database, but this may finish by doing nothing.  Printing this
> message is still important to let the user know that an attempt was
> done so I would keep the message.

+1 for keeping the message.

>> You use pg_total_relation_size() to check the relation size but it
>> might be better to use pg_relation_size() instead. The
>> pg_total_relation_size() includes the all index size but I think it's
>> common based on my experience that we check only the table size
>> (including toast table) to do vacuum it or not.
>
> Yes, I am also not completely sure if the way things are defined in
> the patch are completely what we are looking for.  Still, I have seen
> as well people complain about the total amount of space a table is
> physically taking on disk, including its indexes.  So from the user
> experience the direction taken by the patch makes sense, as much as
> does the argument you do.

Good point.  I think allowing multiple different relation size options
here would be confusing, too (e.g. --min-relation-size versus
--min-total-relation-size).  IMO pg_total_relation_size() is the way
to go here, as we'll most likely need to process the indexes and TOAST
tables, too.  If/when there is a DISABLE_INDEX_CLEANUP option for
VACUUM, we'd then want to use pg_table_size() when --min-relation-size
and --disable-index-cleanup are used together in vacuumdb.  Thoughts?

I've realized that I forgot to add links to the XID/MXID wraparound
documentation like you suggested a while back.  I'll make sure that
gets included in the next patch set, too.

Nathan

Re: A few new options for vacuumdb

From

Michael Paquier

Date:

09 January 2019, 01:06:37

On Tue, Jan 08, 2019 at 06:46:11PM +0000, Bossart, Nathan wrote:
> This was done in order to maintain the current behavior that
> appendQualifiedRelation() gives us.  I found that skipping the
> search_path handling here forced us to specify the schema in the
> argument for --table in most cases.  At the very least, I could add a
> comment here to highlight the importance of fully qualifying
> everything in the catalog query.  What do you think?

A comment sounds like a good thing.  And we really shouldn't hijack
search_path even for one query...

> Looks good to me, except for one small thing in the documentation:
>
> +       <para>
> +        Disable all page-skipping behavior during processing based on
> +        the visibility map, similarly to the option
> +        <literal>DISABLE_PAGE_SKIPPING</literal> for <command>VACUUM</command>.
> +       </para>
>
> I think the "similarly to the option" part is slightly misleading.
> It's not just similar, it _is_ using that option in the generated
> commands.  Perhaps we could point to the VACUUM documentation for more
> information about this one.

Sure.  If you have any suggestions, please feel free.  Adding a link
to VACUUM documentation sounds good to me, as long as we avoid
duplicating the description related to DISABLE_PAGE_SKIPPING on the
VACUUM page.

> Good point.  I think allowing multiple different relation size options
> here would be confusing, too (e.g. --min-relation-size versus
> --min-total-relation-size).  IMO pg_total_relation_size() is the way
> to go here, as we'll most likely need to process the indexes and TOAST
> tables, too.  If/when there is a DISABLE_INDEX_CLEANUP option for
> VACUUM, we'd then want to use pg_table_size() when --min-relation-size
> and --disable-index-cleanup are used together in vacuumdb.
> Thoughts?

Yes, using pg_total_relation_size() looks like the best option to me
here as well, still this does not make me 100% comfortable from the
user perspective.

> I've realized that I forgot to add links to the XID/MXID wraparound
> documentation like you suggested a while back.  I'll make sure that
> gets included in the next patch set, too.

Nice, thanks.
--
Michael

Attachment

signature.asc

Re: A few new options for vacuumdb

From

Masahiko Sawada

Date:

09 January 2019, 01:33:00

On Wed, Jan 9, 2019 at 10:06 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Tue, Jan 08, 2019 at 06:46:11PM +0000, Bossart, Nathan wrote:
> > This was done in order to maintain the current behavior that
> > appendQualifiedRelation() gives us.  I found that skipping the
> > search_path handling here forced us to specify the schema in the
> > argument for --table in most cases.  At the very least, I could add a
> > comment here to highlight the importance of fully qualifying
> > everything in the catalog query.  What do you think?
>
> A comment sounds like a good thing.  And we really shouldn't hijack
> search_path even for one query...
>
> > Looks good to me, except for one small thing in the documentation:
> >
> > +       <para>
> > +        Disable all page-skipping behavior during processing based on
> > +        the visibility map, similarly to the option
> > +        <literal>DISABLE_PAGE_SKIPPING</literal> for <command>VACUUM</command>.
> > +       </para>
> >
> > I think the "similarly to the option" part is slightly misleading.
> > It's not just similar, it _is_ using that option in the generated
> > commands.  Perhaps we could point to the VACUUM documentation for more
> > information about this one.
>
> Sure.  If you have any suggestions, please feel free.  Adding a link
> to VACUUM documentation sounds good to me, as long as we avoid
> duplicating the description related to DISABLE_PAGE_SKIPPING on the
> VACUUM page.
>
> > Good point.  I think allowing multiple different relation size options
> > here would be confusing, too (e.g. --min-relation-size versus
> > --min-total-relation-size).  IMO pg_total_relation_size() is the way
> > to go here, as we'll most likely need to process the indexes and TOAST
> > tables, too.  If/when there is a DISABLE_INDEX_CLEANUP option for
> > VACUUM, we'd then want to use pg_table_size() when --min-relation-size
> > and --disable-index-cleanup are used together in vacuumdb.
> > Thoughts?
>
> Yes, using pg_total_relation_size() looks like the best option to me
> here as well, still this does not make me 100% comfortable from the
> user perspective.

Agreed.

Since pg_(total)_realtion_size() returns 0 for parent table the
specifying the parent table to vacuumdb with --min-relation-size
always does nothing. Maybe we will need to deal with this case when a
function returning whole partitoned table size is introduced.
Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: A few new options for vacuumdb

From

Michael Paquier

Date:

09 January 2019, 04:33:16

On Wed, Jan 09, 2019 at 10:33:00AM +0900, Masahiko Sawada wrote:
> Since pg_(total)_relation_size() returns 0 for parent table the
> specifying the parent table to vacuumdb with --min-relation-size
> always does nothing. Maybe we will need to deal with this case when a
> function returning whole partitoned table size is introduced.

Good point.  I am not sure if we want to go down to having a size
function dedicated to partitions especially as this would just now be
a wrapper around pg_partition_tree(), but the size argument with
partitioned tables is something to think about.  If we cannot sort out
this part cleanly, perhaps we could just focus on the age-ing
parameters and the other ones first?  It seems to me that what is
proposed on this thread has value, so we could shave things and keep
the essential, and focus on what we are sure about for simplicity.
--
Michael

Attachment

signature.asc

Re: A few new options for vacuumdb

From

"Bossart, Nathan"

Date:

09 January 2019, 18:42:40

On 1/8/19, 10:34 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
> On Wed, Jan 09, 2019 at 10:33:00AM +0900, Masahiko Sawada wrote:
>> Since pg_(total)_relation_size() returns 0 for parent table the
>> specifying the parent table to vacuumdb with --min-relation-size
>> always does nothing. Maybe we will need to deal with this case when a
>> function returning whole partitoned table size is introduced.
>
> Good point.  I am not sure if we want to go down to having a size
> function dedicated to partitions especially as this would just now be
> a wrapper around pg_partition_tree(), but the size argument with
> partitioned tables is something to think about.  If we cannot sort out
> this part cleanly, perhaps we could just focus on the age-ing
> parameters and the other ones first?  It seems to me that what is
> proposed on this thread has value, so we could shave things and keep
> the essential, and focus on what we are sure about for simplicity.

Sounds good.  I'll leave out --min-relation-size for now.

Nathan

Re: A few new options for vacuumdb

From

Masahiko Sawada

Date:

10 January 2019, 00:34:00

On Wed, Jan 9, 2019 at 1:33 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Wed, Jan 09, 2019 at 10:33:00AM +0900, Masahiko Sawada wrote:
> > Since pg_(total)_relation_size() returns 0 for parent table the
> > specifying the parent table to vacuumdb with --min-relation-size
> > always does nothing. Maybe we will need to deal with this case when a
> > function returning whole partitoned table size is introduced.
>
> Good point.  I am not sure if we want to go down to having a size
> function dedicated to partitions especially as this would just now be
> a wrapper around pg_partition_tree(), but the size argument with
> partitioned tables is something to think about.  If we cannot sort out
> this part cleanly, perhaps we could just focus on the age-ing
> parameters and the other ones first?  It seems to me that what is
> proposed on this thread has value, so we could shave things and keep
> the essential, and focus on what we are sure about for simplicity.

Agreed.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: A few new options for vacuumdb

From

"Bossart, Nathan"

Date:

21 January 2019, 22:09:09

Here's a new patch set that should address the feedback in this
thread.  The changes in this version include:

 - 0001 is a small fix to the 'vacuumdb --disable-page-skipping'
   documentation.  My suggestion is to keep it short and simple like
   --full, --freeze, --skip-locked, --verbose, and --analyze.  The
   DISABLE_PAGE_SKIPPING option is well-described in the VACUUM
   documentation, and IMO it is reasonably obvious that such vacuumdb
   options correspond to the VACUUM options.
 - v3-0002 is essentially v2-0001 and v2-0004 combined.  I've also
   added a comment explaining the importance of fully qualifying the
   catalog query used to discover tables to process.
 - 0003 includes additional documentation changes explaining the main
   uses of --min-xid-age and --min-mxid-age and linking to the
   existing wraparound documentation.

As discussed, I've left out the patch for --min-relation-size for now.

Nathan

Attachment

Re: A few new options for vacuumdb

From

Michael Paquier

Date:

22 January 2019, 06:07:30

On Mon, Jan 21, 2019 at 10:09:09PM +0000, Bossart, Nathan wrote:
> Here's a new patch set that should address the feedback in this
> thread.  The changes in this version include:
>
>  - 0001 is a small fix to the 'vacuumdb --disable-page-skipping'
>    documentation.  My suggestion is to keep it short and simple like
>    --full, --freeze, --skip-locked, --verbose, and --analyze.  The
>    DISABLE_PAGE_SKIPPING option is well-described in the VACUUM
>    documentation, and IMO it is reasonably obvious that such vacuumdb
>    options correspond to the VACUUM options.

Okay, committed this one.

>  - v3-0002 is essentially v2-0001 and v2-0004 combined.  I've also
>    added a comment explaining the importance of fully qualifying the
>    catalog query used to discover tables to process.

Regarding the search_path business, there is actually similar business
in expand_table_name_patterns() for pg_dump if you look close by, so
as users calling --table don't have to specify the schema, and just
stick with patterns.

+   /*
+    * Prepare the list of tables to process by querying the catalogs.
+    *
+    * Since we execute the constructed query with the default search_path
+    * (which could be unsafe), it is important to fully qualify everything.
+    */
It is not only important, but also absolutely mandatory, so let's make
the comment insisting harder on that point.  From this point of view,
the query that you are building is visibly correct.

+       appendStringLiteralConn(&catalog_query, just_table, conn);
+       appendPQExpBuffer(&catalog_query, "::pg_catalog.regclass\n");
+
+       pg_free(just_table);
+
+       cell = cell->next;
+       if (cell == NULL)
+           appendPQExpBuffer(&catalog_query, " )\n");
Hm.  It seems to me that you are duplicating what
processSQLNamePattern() does, so we ought to use it.  And it is
possible to control the generation of the different WHERE clauses with
a single query based on the number of elements in the list.  Perhaps I
am missing something?  It looks unnecessary to export
split_table_columns_spec() as well.

-   qr/statement: ANALYZE;/,
+   qr/statement: ANALYZE pg_catalog\./,
Or we could just use "ANALYZE \.;/", perhaps patching it first.
Perhaps my regexp here is incorrect, but I just mean to check for an
ANALYZE query which begins by "ANALYZE " and finishes with ";",
without caring about what is in-between.  This would make the tests
more portable.

>  - 0003 includes additional documentation changes explaining the main
>    uses of --min-xid-age and --min-mxid-age and linking to the
>    existing wraparound documentation.

+$node->issues_sql_like(
+   [ 'vacuumdb', '--table=pg_class', '--min-mxid-age=123456789',
'postgres'],
+   qr/AND GREATEST\(pg_catalog.mxid_age\(c.relminmxid\),
pg_catalog.mxid_age\(t.relminmxid\)\) OPERATOR\(pg_catalog.>=\) 123456789/,
+   'vacuumdb --table --min-mxid-age');
+$node->issues_sql_like(
+   [ 'vacuumdb', '--min-xid-age=987654321', 'postgres' ],
+   qr/AND GREATEST\(pg_catalog.age\(c.relfrozenxid\),
pg_catalog.age\(t.relfrozenxid\)\) OPERATOR\(pg_catalog.>=\) 987654321/,
+   'vacuumdb --min-xid-age');
It may be better to use numbers which make sure that no relations are
actually fetched, so as if the surrounding tests are messed up we
never make them longer than necessary just to check the shape of the
query.
--
Michael

Attachment

signature.asc

Re: A few new options for vacuumdb

From

"Bossart, Nathan"

Date:

22 January 2019, 23:21:32

On 1/21/19, 10:08 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
> On Mon, Jan 21, 2019 at 10:09:09PM +0000, Bossart, Nathan wrote:
>> Here's a new patch set that should address the feedback in this
>> thread.  The changes in this version include:
>> 
>>  - 0001 is a small fix to the 'vacuumdb --disable-page-skipping'
>>    documentation.  My suggestion is to keep it short and simple like
>>    --full, --freeze, --skip-locked, --verbose, and --analyze.  The
>>    DISABLE_PAGE_SKIPPING option is well-described in the VACUUM
>>    documentation, and IMO it is reasonably obvious that such vacuumdb
>>    options correspond to the VACUUM options.
>
> Okay, committed this one.

Thanks.

> Regarding the search_path business, there is actually similar business
> in expand_table_name_patterns() for pg_dump if you look close by, so
> as users calling --table don't have to specify the schema, and just
> stick with patterns.

I see.  The "WHERE c.relkind..." clause looks a bit different than
what I have, so I've altered vacuumdb's catalog query to match.  This
was changed in expand_table_name_patterns() as part of 582edc369cd for
a security fix, so we should probably do something similar here.

> +   /*
> +    * Prepare the list of tables to process by querying the catalogs.
> +    *
> +    * Since we execute the constructed query with the default search_path
> +    * (which could be unsafe), it is important to fully qualify everything.
> +    */
> It is not only important, but also absolutely mandatory, so let's make
> the comment insisting harder on that point.  From this point of view,
> the query that you are building is visibly correct.

I've updated the comment as suggested.

> +       appendStringLiteralConn(&catalog_query, just_table, conn);
> +       appendPQExpBuffer(&catalog_query, "::pg_catalog.regclass\n");
> +
> +       pg_free(just_table);
> +
> +       cell = cell->next;
> +       if (cell == NULL)
> +           appendPQExpBuffer(&catalog_query, " )\n");
> Hm.  It seems to me that you are duplicating what
> processSQLNamePattern() does, so we ought to use it.  And it is
> possible to control the generation of the different WHERE clauses with
> a single query based on the number of elements in the list.  Perhaps I
> am missing something?  It looks unnecessary to export
> split_table_columns_spec() as well.

I'm sorry, I don't quite follow this.  AFAICT processSQLNamePattern()
is for generating WHERE clauses based on a wildcard-pattern string for
psql and pg_dump.  This portion of the patch is generating a WHERE
clause to filter only for the tables listed via --table, and I don't
think the --table option currently accepts patterns.  Since you can
also provide a list of columns with the --table option, I am using
split_table_columns_spec() to retrieve only the table name for the OID
lookup.

> -   qr/statement: ANALYZE;/,
> +   qr/statement: ANALYZE pg_catalog\./,
> Or we could just use "ANALYZE \.;/", perhaps patching it first.
> Perhaps my regexp here is incorrect, but I just mean to check for an
> ANALYZE query which begins by "ANALYZE " and finishes with ";",
> without caring about what is in-between.  This would make the tests
> more portable.

Sure, I've split this out into a separate patch.

> +$node->issues_sql_like(
> +   [ 'vacuumdb', '--table=pg_class', '--min-mxid-age=123456789',
> 'postgres'],
> +   qr/AND GREATEST\(pg_catalog.mxid_age\(c.relminmxid\),
> pg_catalog.mxid_age\(t.relminmxid\)\) OPERATOR\(pg_catalog.>=\) 123456789/,
> +   'vacuumdb --table --min-mxid-age');
> +$node->issues_sql_like(
> +   [ 'vacuumdb', '--min-xid-age=987654321', 'postgres' ],
> +   qr/AND GREATEST\(pg_catalog.age\(c.relfrozenxid\),
> pg_catalog.age\(t.relfrozenxid\)\) OPERATOR\(pg_catalog.>=\) 987654321/,
> +   'vacuumdb --min-xid-age');
> It may be better to use numbers which make sure that no relations are
> actually fetched, so as if the surrounding tests are messed up we
> never make them longer than necessary just to check the shape of the
> query.

I think it's already pretty unlikely that any matching relations would
be found, but I've updated these values to be within the range where
any matching database has already stopped accepting commands to
prevent wraparound.

Nathan

Attachment

Re: A few new options for vacuumdb

From

Michael Paquier

Date:

23 January 2019, 03:41:11

On Tue, Jan 22, 2019 at 11:21:32PM +0000, Bossart, Nathan wrote:
> On 1/21/19, 10:08 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
> > On Mon, Jan 21, 2019 at 10:09:09PM +0000, Bossart, Nathan wrote:
> >> Here's a new patch set that should address the feedback in this
> >> thread.  The changes in this version include:
> >>
> >>  - 0001 is a small fix to the 'vacuumdb --disable-page-skipping'
> >>    documentation.  My suggestion is to keep it short and simple like
> >>    --full, --freeze, --skip-locked, --verbose, and --analyze.  The
> >>    DISABLE_PAGE_SKIPPING option is well-described in the VACUUM
> >>    documentation, and IMO it is reasonably obvious that such vacuumdb
> >>    options correspond to the VACUUM options.
> >
> > Okay, committed this one.
>
> Thanks.
>
> > Regarding the search_path business, there is actually similar business
> > in expand_table_name_patterns() for pg_dump if you look close by, so
> > as users calling --table don't have to specify the schema, and just
> > stick with patterns.
>
> I see.  The "WHERE c.relkind..." clause looks a bit different than
> what I have, so I've altered vacuumdb's catalog query to match.  This
> was changed in expand_table_name_patterns() as part of 582edc369cd for
> a security fix, so we should probably do something similar here.
>
> > +   /*
> > +    * Prepare the list of tables to process by querying the catalogs.
> > +    *
> > +    * Since we execute the constructed query with the default search_path
> > +    * (which could be unsafe), it is important to fully qualify everything.
> > +    */
> > It is not only important, but also absolutely mandatory, so let's make
> > the comment insisting harder on that point.  From this point of view,
> > the query that you are building is visibly correct.
>
> I've updated the comment as suggested.
>
> > +       appendStringLiteralConn(&catalog_query, just_table, conn);
> > +       appendPQExpBuffer(&catalog_query, "::pg_catalog.regclass\n");
> > +
> > +       pg_free(just_table);
> > +
> > +       cell = cell->next;
> > +       if (cell == NULL)
> > +           appendPQExpBuffer(&catalog_query, " )\n");
> > Hm.  It seems to me that you are duplicating what
> > processSQLNamePattern() does, so we ought to use it.  And it is
> > possible to control the generation of the different WHERE clauses with
> > a single query based on the number of elements in the list.  Perhaps I
> > am missing something?  It looks unnecessary to export
> > split_table_columns_spec() as well.
>
> I'm sorry, I don't quite follow this.  AFAICT processSQLNamePattern()
> is for generating WHERE clauses based on a wildcard-pattern string for
> psql and pg_dump.  This portion of the patch is generating a WHERE
> clause to filter only for the tables listed via --table, and I don't
> think the --table option currently accepts patterns.  Since you can
> also provide a list of columns with the --table option, I am using
> split_table_columns_spec() to retrieve only the table name for the OID
> lookup.

Bah, of course you are right.  vacuumdb does not support patterns but
I thought it was able to handle that.  With column names that would be
sort of funny anyway.

>> Or we could just use "ANALYZE \.;/", perhaps patching it first.
>> Perhaps my regexp here is incorrect, but I just mean to check for an
>> ANALYZE query which begins by "ANALYZE " and finishes with ";",
>> without caring about what is in-between.  This would make the tests
>> more portable.
>
> Sure, I've split this out into a separate patch.

Thanks, I have committed this part to ease the follow-up work.

> I think it's already pretty unlikely that any matching relations would
> be found, but I've updated these values to be within the range where
> any matching database has already stopped accepting commands to
> prevent wraparound.

Thanks.

I have been looking at 0002, and I found a problem with the way
ANALYZE queries are generated.  For example take this table:
CREATE TABLE aa1 (a int);

Then if I try to run ANALYZE with vacuumdb it just works:
$ vacuumdb -z --table 'aa1(b)'
vacuumdb: vacuuming database "ioltas"

Note that this fails with HEAD, but passes with your patch.  The
problem is that the query generated misses the lists of columns when
processing them through split_table_columns_spec(), as what is
generated is that:
VACUUM (ANALYZE) public.aa1;

So the result is actually incorrect because all the columns get
processed.

This patch moves the check about the existence of the relation when
querying the catalogs, perhaps we would want the same for columns for
consistency?  Or not.  That's a bit harder for sure, not impossible
visibly, still that would mean duplicating a bunch of logic that the
backend is doing by itself, so we could live without it I think.

Attached is a first patch to add more tests in this area, which passes
on HEAD but fails with your patch.  It looks like a good idea to tweak
the tests first as there is no coverage yet for the queries generated
with complete and partial column lists.

At the same time, we may want to change split_table_columns_spec's
name to be more consistent with the other APIs, like
splitTableColumnspec.  That's a nit though..
--
Michael

Attachment

Re: A few new options for vacuumdb

From

"Bossart, Nathan"

Date:

24 January 2019, 00:49:28

On 1/22/19, 7:41 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
> I have been looking at 0002, and I found a problem with the way
> ANALYZE queries are generated.  For example take this table:
> CREATE TABLE aa1 (a int);
>
> Then if I try to run ANALYZE with vacuumdb it just works:
> $ vacuumdb -z --table 'aa1(b)'
> vacuumdb: vacuuming database "ioltas"
>
> Note that this fails with HEAD, but passes with your patch.  The
> problem is that the query generated misses the lists of columns when
> processing them through split_table_columns_spec(), as what is
> generated is that:
> VACUUM (ANALYZE) public.aa1;
>
> So the result is actually incorrect because all the columns get
> processed.
>
> This patch moves the check about the existence of the relation when
> querying the catalogs, perhaps we would want the same for columns for
> consistency?  Or not.  That's a bit harder for sure, not impossible
> visibly, still that would mean duplicating a bunch of logic that the
> backend is doing by itself, so we could live without it I think.

Oh, wow.  Thanks for pointing this out.  I should have caught this.
With 0002, we are basically just throwing out the column lists
entirely as we obtain the qualified identifiers from the catalog
query.  To fix this, I've added an optional CTE for tracking any
provided column lists.  v5-0001 is your test patch for this case, and
v5-0002 splits out the work for split_table_columns_spec().

Nathan

Attachment

Re: A few new options for vacuumdb

From

Michael Paquier

Date:

27 January 2019, 13:33:01

On Thu, Jan 24, 2019 at 12:49:28AM +0000, Bossart, Nathan wrote:
> Oh, wow.  Thanks for pointing this out.  I should have caught this.
> With 0002, we are basically just throwing out the column lists
> entirely as we obtain the qualified identifiers from the catalog
> query.  To fix this, I've added an optional CTE for tracking any
> provided column lists.  v5-0001 is your test patch for this case, and
> v5-0002 splits out the work for split_table_columns_spec().

Committed the test portion for now, still reviewing the rest..
--
Michael

Attachment

signature.asc

Re: A few new options for vacuumdb

From

Michael Paquier

Date:

28 January 2019, 05:57:02

On Thu, Jan 24, 2019 at 12:49:28AM +0000, Bossart, Nathan wrote:
> Oh, wow.  Thanks for pointing this out.  I should have caught this.
> With 0002, we are basically just throwing out the column lists
> entirely as we obtain the qualified identifiers from the catalog
> query.  To fix this, I've added an optional CTE for tracking any
> provided column lists.  v5-0001 is your test patch for this case, and
> v5-0002 splits out the work for split_table_columns_spec().

I think that the query generation could be simplified by always using
the CTE if column lists are present or not, by associating NULL if no
column list is present, and by moving the regclass casting directly
into the CTE.

This way, for the following vacuumdb command with a set of tables
wanted:
vacuumdb --table aa --table 'bb(b)' --table 'cc(c)'

Then the query generated looks like that:
WITH column_lists (table_name, column_list) AS (
  VALUES ('aa'::pg_catalog.regclass, NULL),
         ('bb'::pg_catalog.regclass, '(b)'),
         ('cc'::pg_catalog.regclass, '(c)')
  )
SELECT c.relname, ns.nspname, column_lists.column_list
  FROM pg_catalog.pg_class c
    JOIN pg_catalog.pg_namespace ns ON c.relnamespace
      OPERATOR(pg_catalog.=) ns.oid
    JOIN column_lists ON
       column_lists.table_name
       OPERATOR(pg_catalog.=) c.oid
  WHERE c.relkind OPERATOR(pg_catalog.=) ANY (array['r', 'm'])
  ORDER BY c.relpages DESC;

So only the following parts are added:
- The CTE with a table and its column list.
- A join on pg_class.oid and column_lists.table_name.
The latest version of the patch is doing a double amount of work by
using a left join and an extra set of clauses in WHERE to fetch the
matching column list from the table name entry.

If no tables are listed, then we just finish with that:
SELECT c.relname, ns.nspname FROM pg_catalog.pg_class c
 JOIN pg_catalog.pg_namespace ns ON c.relnamespace
   OPERATOR(pg_catalog.=) ns.oid
 WHERE c.relkind OPERATOR(pg_catalog.=) ANY (array['r', 'm'])
   ORDER BY c.relpages DESC;
--
Michael

Attachment

signature.asc

Re: A few new options for vacuumdb

From

"Bossart, Nathan"

Date:

28 January 2019, 21:58:17

On 1/27/19, 9:57 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
> I think that the query generation could be simplified by always using
> the CTE if column lists are present or not, by associating NULL if no
> column list is present, and by moving the regclass casting directly
> into the CTE.

Yes, this simplifies the code quite a bit.  I did it this way in v6.

Nathan

Attachment

Re: A few new options for vacuumdb

From

Michael Paquier

Date:

29 January 2019, 02:35:26

On Mon, Jan 28, 2019 at 09:58:17PM +0000, Bossart, Nathan wrote:
> Yes, this simplifies the code quite a bit.  I did it this way in
> v6.

Thanks for the quick update.  Things could have been made a bit more
simple by just using a for loop instead of while, even if the table
list can be NULL for database-wide operations (perhaps that's a matter
of taste..).  prepare_vacuum_command() could be simplified further by
using the server version instead of the full connection string.  The
comments at the top of the routine were not properly updated either,
and the last portion appending the relation name just needs one
appendPQExpBuffer() call instead of three separate calls.  PQclear()
could have been moved closer to where all the query results have been
consumed, to make the whole more consistent.

Anyway, patches 1 and 2 have been merged, and committed after some
cleanup and adjustments.  Patch 3 gets much easier now.

-    " ON c.relnamespace OPERATOR(pg_catalog.=) ns.oid\n");
+    " ON c.relnamespace OPERATOR(pg_catalog.=) ns.oid\n"
+    " LEFT JOIN pg_catalog.pg_class t"
+    " ON c.reltoastrelid OPERATOR(pg_catalog.=) t.oid\n");
Why do need this part?
--
Michael

Attachment

signature.asc

Re: A few new options for vacuumdb

From

"Bossart, Nathan"

Date:

29 January 2019, 21:48:18

On 1/28/19, 6:35 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
> Anyway, patches 1 and 2 have been merged, and committed after some
> cleanup and adjustments.  Patch 3 gets much easier now.

Thanks!

> -    " ON c.relnamespace OPERATOR(pg_catalog.=) ns.oid\n");
> +    " ON c.relnamespace OPERATOR(pg_catalog.=) ns.oid\n"
> +    " LEFT JOIN pg_catalog.pg_class t"
> +    " ON c.reltoastrelid OPERATOR(pg_catalog.=) t.oid\n");
> Why do need this part?

This is modeled after the query provided in the docs for preventing
transaction ID wraparound [0].  I think the idea is to combine the
relation with its TOAST table so that it does not need to be
considered separately.  The VACUUM commands generated in vacuumdb will
also process the corresponding TOAST table for the relation, anyway.

I noticed a behavior change from the catalog query patch that we
probably ought to fix.  The "WHERE c.relkind IN ('r', 'm')" clause
seems sufficient to collect all vacuumable relations (TOAST tables are
handled when vacuuming the main relation, and partitioned tables are
handled by vacuuming the partitions individually), but it is not
sufficient to match the previous behavior when --table is used.
Previously, we did not filter by relkind at all when --table is used.
Instead, we let the server emit a WARNING when a relation that
couldn't be processed was specified.

Previous behavior:
    ~% vacuumdb -d postgres -t foreign_table
    vacuumdb: vacuuming database "postgres"
    WARNING:  skipping "foreign_table" --- cannot vacuum non-tables or special system tables
    ~% vacuumdb -d postgres -t pg_toast.pg_toast_2600 --analyze-only
    vacuumdb: vacuuming database "postgres"
    WARNING:  skipping "pg_toast_2600" --- cannot analyze non-tables or special system tables

Current behavior:
    ~% vacuumdb -d postgres -t foreign_table
    vacuumdb: vacuuming database "postgres"
    ~% vacuumdb -d postgres -t pg_toast.pg_toast_2600 --analyze-only
    vacuumdb: vacuuming database "postgres"

I think the simplest way to fix this is to remove the relkind clause
altogether when --table is used and to let the server decide whether
it should be processed.  This effectively reinstates the previous
behavior so that users can specify TOAST tables, partitioned tables,
etc.

Unfortunately, this complicates the --min-xid-age and --min-mxid-age
patch a bit, as some of the relation types that can be vacuumed and/or
analyzed do not really have a transaction ID age.  AFAICT the simplest
way to handle this case is to filter out relations with a relfrozenxid
or relminmxid of 0.

The v7 patch set implements these proposed approaches.

Nathan

[0] https://www.postgresql.org/docs/devel/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND

Attachment

Re: A few new options for vacuumdb

From

Michael Paquier

Date:

30 January 2019, 00:46:53

On Tue, Jan 29, 2019 at 09:48:18PM +0000, Bossart, Nathan wrote:
> On 1/28/19, 6:35 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
>> -    " ON c.relnamespace OPERATOR(pg_catalog.=) ns.oid\n");
>> +    " ON c.relnamespace OPERATOR(pg_catalog.=) ns.oid\n"
>> +    " LEFT JOIN pg_catalog.pg_class t"
>> +    " ON c.reltoastrelid OPERATOR(pg_catalog.=) t.oid\n");
>> Why do need this part?
>
> This is modeled after the query provided in the docs for preventing
> transaction ID wraparound [0].  I think the idea is to combine the
> relation with its TOAST table so that it does not need to be
> considered separately.  The VACUUM commands generated in vacuumdb will
> also process the corresponding TOAST table for the relation, anyway.

Oh, OK.  This makes sense.  It would be nice to add a comment in the
patch and to document this calculation method in the docs of
vacuumdb.

> I noticed a behavior change from the catalog query patch that we
> probably ought to fix.  The "WHERE c.relkind IN ('r', 'm')" clause
> seems sufficient to collect all vacuumable relations (TOAST tables are
> handled when vacuuming the main relation, and partitioned tables are
> handled by vacuuming the partitions individually), but it is not
> sufficient to match the previous behavior when --table is used.
> Previously, we did not filter by relkind at all when --table is used.
> Instead, we let the server emit a WARNING when a relation that
> couldn't be processed was specified.

Indeed, the WARNING can be useful for some users when trying to work
on an incorrect relation kind, especially when not using --verbose.
Fixed after adding a test with command_checks_all.

> Unfortunately, this complicates the --min-xid-age and --min-mxid-age
> patch a bit, as some of the relation types that can be vacuumed and/or
> analyzed do not really have a transaction ID age.  AFAICT the simplest
> way to handle this case is to filter out relations with a relfrozenxid
> or relminmxid of 0.

We should be able to live with that.
--
Michael

Attachment

signature.asc

Re: A few new options for vacuumdb

From

"Bossart, Nathan"

Date:

30 January 2019, 17:45:58

On 1/29/19, 4:47 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
> On Tue, Jan 29, 2019 at 09:48:18PM +0000, Bossart, Nathan wrote:
>> On 1/28/19, 6:35 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
>>> -    " ON c.relnamespace OPERATOR(pg_catalog.=) ns.oid\n");
>>> +    " ON c.relnamespace OPERATOR(pg_catalog.=) ns.oid\n"
>>> +    " LEFT JOIN pg_catalog.pg_class t"
>>> +    " ON c.reltoastrelid OPERATOR(pg_catalog.=) t.oid\n");
>>> Why do need this part?
>> 
>> This is modeled after the query provided in the docs for preventing
>> transaction ID wraparound [0].  I think the idea is to combine the
>> relation with its TOAST table so that it does not need to be
>> considered separately.  The VACUUM commands generated in vacuumdb will
>> also process the corresponding TOAST table for the relation, anyway.
>
> Oh, OK.  This makes sense.  It would be nice to add a comment in the
> patch and to document this calculation method in the docs of
> vacuumdb.

Sure, this is added in v8.

>> I noticed a behavior change from the catalog query patch that we
>> probably ought to fix.  The "WHERE c.relkind IN ('r', 'm')" clause
>> seems sufficient to collect all vacuumable relations (TOAST tables are
>> handled when vacuuming the main relation, and partitioned tables are
>> handled by vacuuming the partitions individually), but it is not
>> sufficient to match the previous behavior when --table is used.
>> Previously, we did not filter by relkind at all when --table is used.
>> Instead, we let the server emit a WARNING when a relation that
>> couldn't be processed was specified.
>
> Indeed, the WARNING can be useful for some users when trying to work
> on an incorrect relation kind, especially when not using --verbose.
> Fixed after adding a test with command_checks_all.

Thanks.  Something else I noticed is that we do not retrieve foreign
tables and partitioned tables for --analyze and --analyze-only.
However, that has long been the case for parallel mode, and this issue
should probably get its own thread.

Nathan

Attachment

v8-0001-Add-min-xid-age-and-min-mxid-age-options-to-vacuu.patch

Re: A few new options for vacuumdb

From

Michael Paquier

Date:

31 January 2019, 02:03:35

On Wed, Jan 30, 2019 at 05:45:58PM +0000, Bossart, Nathan wrote:
> On 1/29/19, 4:47 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
>> Oh, OK.  This makes sense.  It would be nice to add a comment in the
>> patch and to document this calculation method in the docs of
>> vacuumdb.
>
> Sure, this is added in v8.

Thanks, Nathan.

Something which was not correct in the patch is the compatibility of
the query.  xid <> xid has been added in 9.6, so the new options will
not be able to work with older versions.  The versions marked as
compatible in the last patch came from the age-ing functions, but you
added direct comparisons with relfrozenxid and relminmxid in the
latest versions of the patch.  This implementation goes down a couple
of released versions, which is useful enough in my opinion, so I would
keep it as-is.

I have added as well some markups around "PostgreSQL" in the docs, and
extra casts for the integer/xid values of the query.  The test
patterns are also simplified, and I added tests for incorrect values
of --min-xid-age and --min-mxid-age.  Does that look correct to you?

> Thanks.  Something else I noticed is that we do not retrieve foreign
> tables and partitioned tables for --analyze and --analyze-only.
> However, that has long been the case for parallel mode, and this issue
> should probably get its own thread.

Good point, this goes down a couple of releases, and statistics on
both may be useful to compile for a system-wide operation.  Spawning a
separate thread looks adapted to me.
--
Michael

Attachment

Re: A few new options for vacuumdb

From

"Bossart, Nathan"

Date:

31 January 2019, 02:28:05

On 1/30/19, 6:04 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
> Something which was not correct in the patch is the compatibility of
> the query.  xid <> xid has been added in 9.6, so the new options will
> not be able to work with older versions.  The versions marked as
> compatible in the last patch came from the age-ing functions, but you
> added direct comparisons with relfrozenxid and relminmxid in the
> latest versions of the patch.  This implementation goes down a couple
> of released versions, which is useful enough in my opinion, so I would
> keep it as-is.

Agreed.  Thanks for catching this.

> I have added as well some markups around "PostgreSQL" in the docs, and
> extra casts for the integer/xid values of the query.  The test
> patterns are also simplified, and I added tests for incorrect values
> of --min-xid-age and --min-mxid-age.  Does that look correct to you?

It looks good to me.  The only thing I noticed is the use of
relfrozenxid instead of relminmxid here:

+               appendPQExpBuffer(&catalog_query,
+                                                 " %s GREATEST(pg_catalog.mxid_age(c.relminmxid),"
+                                                 " pg_catalog.mxid_age(t.relminmxid)) OPERATOR(pg_catalog.>=)"
+                                                 " '%d'::pg_catalog.int4\n"
+                                                 " AND c.relfrozenxid OPERATOR(pg_catalog.!=)"
+                                                 " '0'::pg_catalog.xid\n",
+                                                 has_where ? "AND" : "WHERE", vacopts->min_mxid_age);

However, that may still work as intended.

Nathan

Re: A few new options for vacuumdb

From

Michael Paquier

Date:

31 January 2019, 04:20:24

On Thu, Jan 31, 2019 at 02:28:05AM +0000, Bossart, Nathan wrote:
> It looks good to me.  The only thing I noticed is the use of
> relfrozenxid instead of relminmxid here:

Fixed and committed.  So we are finally done here, except for the
debate with the relation size.
--
Michael

Attachment

signature.asc

Re: A few new options for vacuumdb

From

"Bossart, Nathan"

Date:

31 January 2019, 15:41:34

On 1/30/19, 8:21 PM, "Michael Paquier" <michael@paquier.xyz> wrote:
> Fixed and committed.  So we are finally done here, except for the
> debate with the relation size.

Thanks for the diligent reviews, as always.  I don't plan to pick up
--min-relation-size right now, but I may attempt it again in a future
commitfest.

Nathan

Re: A few new options for vacuumdb

From

Michael Paquier

Date:

01 February 2019, 01:50:32

On Thu, Jan 31, 2019 at 03:41:34PM +0000, Bossart, Nathan wrote:
> Thanks for the diligent reviews, as always.  I don't plan to pick up
> --min-relation-size right now, but I may attempt it again in a future
> commitfest.

Sure, thanks for the patches!
--
Michael

Attachment

signature.asc