Thread: A few new options for vacuumdb
Hi hackers, I've attached a few patches to add some options to vacuumdb that might be useful. Specifically, these patches add the --skip-locked, --min-xid-age, --min-mxid-age, and --min-relation-size options. If an option is specified for a server version that is not supported, the option is silently ignored. For example, SKIP_LOCKED was only added to VACUUM and ANALYZE for v12. Alternatively, I think we could fail in vacuum_one_database() if an unsupported option is specified. Some of these options will work on all currently supported versions, so I am curious what others think about skipping some of these version checks altogether. In this set of patches, I've disallowed using --min-xid-age, --min-mxid-age, and --min-relation-size in conjunction with the --table option. IMO this combination of options is prone to confusion. For example: vacuumdb mydb --table mytable --min-relation-size 1024 It does not seem clear whether the user wants us to process mytable only if it is at least 1 GB, or if we should process mytable in addition to any other relations over 1 GB. Either way, I think trying to support these combinations of options adds more complexity than it is worth. 0001 is a minor fix that is somewhat separate from these new options, although the new options will make the edge case it aims to fix much easier to reach. When the catalogs are queried in parallel mode to get the list of tables to process, we currently assume that at least one table will be returned. If no tables are found, the tables variable will stay as NULL, which leads to database-wide VACUUM or ANALYZE commands. Since there are currently no user-configurable options available for this catalog query, this case is likely exceptionally rare. However, with the new options, it is much easier to inadvertently filter out all relations. I will be adding this work to the next commitfest. Nathan
Attachment
On Wed, Dec 19, 2018 at 08:50:10PM +0000, Bossart, Nathan wrote: > If an option is specified for a server version that is not supported, > the option is silently ignored. For example, SKIP_LOCKED was only > added to VACUUM and ANALYZE for v12. Alternatively, I think we could > fail in vacuum_one_database() if an unsupported option is specified. > Some of these options will work on all currently supported versions, > so I am curious what others think about skipping some of these version > checks altogether. prepare_vacuum_command() already handles that by ignoring silently unsupported options (see the case of SKIP_LOCKED). So why not doing the same? > It does not seem clear whether the user wants us to process mytable > only if it is at least 1 GB, or if we should process mytable in > addition to any other relations over 1 GB. Either way, I think trying > to support these combinations of options adds more complexity than it > is worth. It seems to me that a combination of both options means that the listed table should be processed only if its minimum size is 1GB. If multiple tables are specified with --table, then only those reaching 1GB would be processed. So this restriction can go away. The same applies for the proposed --min-xid-age and --min-mxid-age. + <para> + Only execute the vacuum or analyze commands on tables with a multixact + ID age of at least <replaceable class="parameter">mxid_age</replaceable>. + </para> Adding a link to explain the multixact business may be helpful, say vacuum-for-multixact-wraparound. Same comment for XID. > 0001 is a minor fix that is somewhat separate from these new options, > although the new options will make the edge case it aims to fix much > easier to reach. When the catalogs are queried in parallel mode to > get the list of tables to process, we currently assume that at least > one table will be returned. If no tables are found, the tables > variable will stay as NULL, which leads to database-wide VACUUM or > ANALYZE commands. Since there are currently no user-configurable > options available for this catalog query, this case is likely > exceptionally rare. However, with the new options, it is much easier > to inadvertently filter out all relations. Agreed. No need to visibly bother about that in back-branches. There is an argument about also adding DISABLE_PAGE_SKIPPING. -- Michael
Attachment
Hi Michael, Thanks for taking a look. On 12/19/18, 8:05 PM, "Michael Paquier" <michael@paquier.xyz> wrote: > On Wed, Dec 19, 2018 at 08:50:10PM +0000, Bossart, Nathan wrote: >> If an option is specified for a server version that is not supported, >> the option is silently ignored. For example, SKIP_LOCKED was only >> added to VACUUM and ANALYZE for v12. Alternatively, I think we could >> fail in vacuum_one_database() if an unsupported option is specified. >> Some of these options will work on all currently supported versions, >> so I am curious what others think about skipping some of these version >> checks altogether. > > prepare_vacuum_command() already handles that by ignoring silently > unsupported options (see the case of SKIP_LOCKED). So why not doing the > same? The --skip-locked option in vacuumdb is part of 0002, so I don't think there's much precedent here. We do currently fall back to the unparenthesized syntax for VACUUM for versions before 9.0, so there is some version handling already. What do you think about removing version checks for unsupported major versions? If we went that route, these new patches could add version checks only for options that were added in a supported major version (e.g. mxid_age() was added in 9.5). Either way, we'll still have to decide whether to fail or to silently skip the option if you do something like specify --min-mxid-age for a 9.4 server. >> It does not seem clear whether the user wants us to process mytable >> only if it is at least 1 GB, or if we should process mytable in >> addition to any other relations over 1 GB. Either way, I think trying >> to support these combinations of options adds more complexity than it >> is worth. > > It seems to me that a combination of both options means that the listed > table should be processed only if its minimum size is 1GB. If multiple > tables are specified with --table, then only those reaching 1GB would be > processed. So this restriction can go away. The same applies for the > proposed --min-xid-age and --min-mxid-age. Okay. This should be pretty easy to do using individual catalog lookups before we call prepare_vacuum_command(). Alternatively, I could add a clause for each specified table in the existing query in vacuum_one_database(). At that point, it may be cleaner to just use that catalog query in all cases instead of leaving paths where tables can remain NULL. > + <para> > + Only execute the vacuum or analyze commands on tables with a multixact > + ID age of at least <replaceable > class="parameter">mxid_age</replaceable>. > + </para> > Adding a link to explain the multixact business may be helpful, say > vacuum-for-multixact-wraparound. Same comment for XID. Good idea. > There is an argument about also adding DISABLE_PAGE_SKIPPING. I'll add this in the next set of patches. I need to add the parenthesized syntax and --skip-locked for ANALYZE in prepare_vacuum_command(), too. Nathan
On Thu, Dec 20, 2018 at 04:48:11PM +0000, Bossart, Nathan wrote: > The --skip-locked option in vacuumdb is part of 0002, so I don't think > there's much precedent here. It looks like I was not looking at the master branch here ;) > We do currently fall back to the > unparenthesized syntax for VACUUM for versions before 9.0, so there is > some version handling already. What do you think about removing > version checks for unsupported major versions? I am not exactly sure down to which version this needs to be supported and what's the policy about that (if others have an opinion to share, please feel free) but surely it does not hurt to keep those code paths either from a maintenance point of view. > If we went that route, these new patches could add version checks only > for options that were added in a supported major version (e.g. > mxid_age() was added in 9.5). Either way, we'll still have to decide > whether to fail or to silently skip the option if you do something > like specify --min-mxid-age for a 9.4 server. There are downsides and upsides for each approach. Reporting a failure makes it clear that an option is not available with a clear error message, however it complicates a bit the error handling for parallel runs. And vacuum_one_database should be the code path doing as that's where all the connections are taken even when all databases are processed. Ignoring the option, as your patch set does, has the disadvantage to potentially confuse users, say we may see complains like "why is VACUUM locking even if I specified --skip-locked?". Still this keeps the code minimalistic and simple. Just passing down the options and seeing a failure in the query generated is also a confusing experience I guess for not-so-advanced users even if it may simplify the error handling. Issuing a proper error feels more natural to me I think, as users can react on that properly. Opinions from others are welcome. -- Michael
Attachment
On Wed, Dec 19, 2018 at 9:05 PM Michael Paquier <michael@paquier.xyz> wrote: > > It does not seem clear whether the user wants us to process mytable > > only if it is at least 1 GB, or if we should process mytable in > > addition to any other relations over 1 GB. Either way, I think trying > > to support these combinations of options adds more complexity than it > > is worth. > > It seems to me that a combination of both options means that the listed > table should be processed only if its minimum size is 1GB. If multiple > tables are specified with --table, then only those reaching 1GB would be > processed. +1. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Dec 20, 2018 at 11:48 AM Bossart, Nathan <bossartn@amazon.com> wrote: > Either way, we'll still have to decide whether to fail or to silently > skip the option if you do something like specify --min-mxid-age for a > 9.4 server. +1 for fail. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 12/21/18, 10:51 AM, "Robert Haas" <robertmhaas@gmail.com> wrote: > On Thu, Dec 20, 2018 at 11:48 AM Bossart, Nathan <bossartn@amazon.com> wrote: >> Either way, we'll still have to decide whether to fail or to silently >> skip the option if you do something like specify --min-mxid-age for a >> 9.4 server. > > +1 for fail. Sounds good. I'll keep all the version checks and will fail for unsupported options in the next patch set. Nathan
On 12/21/18, 11:14 AM, "Bossart, Nathan" <bossartn@amazon.com> wrote: > On 12/21/18, 10:51 AM, "Robert Haas" <robertmhaas@gmail.com> wrote: >> On Thu, Dec 20, 2018 at 11:48 AM Bossart, Nathan <bossartn@amazon.com> wrote: >>> Either way, we'll still have to decide whether to fail or to silently >>> skip the option if you do something like specify --min-mxid-age for a >>> 9.4 server. >> >> +1 for fail. > > Sounds good. I'll keep all the version checks and will fail for > unsupported options in the next patch set. Here's an updated set of patches with the following changes: - 0002 adds the parenthesized syntax for ANALYZE. - 0003 adds DISABLE_PAGE_SKIPPING for VACUUM. - 0003 also ensures SKIP_LOCKED is applied for --analyze-only. - 0004 alters vacuumdb to always use the catalog query to discover the list of tables to process. - 0003, 0005, and 0006 now fail in vacuum_one_database() if a specified option is not available on the server version. - If tables are listed along with --min-xid-age, --min-mxid-age, or --min-relation-size, each table is processed only if it satisfies the provided options. 0004 introduces a slight change to existing behavior. Currently, if you specify a missing table, vacuumdb will process each table until it reaches the nonexistent one, at which point it will fail. After 0004 is applied, vacuumdb will fail during the catalog query, and no tables will be processed. I considered avoiding this change in behavior by doing an additional catalog lookup for each specified table to see whether it satisfies --min-xid-age, etc. However, this introduced quite a bit more complexity and increased the number of catalog queries needed. Nathan
Attachment
- v2-0006-Add-min-relation-size-option-to-vacuumdb.patch
- v2-0001-Do-not-process-any-relations-if-the-catalog-query.patch
- v2-0002-Support-parenthesized-syntax-for-ANALYZE-in-vacuu.patch
- v2-0003-Add-disable-page-skipping-and-skip-locked-to-vacu.patch
- v2-0004-Always-use-a-catalog-query-to-discover-tables-to-.patch
- v2-0005-Add-min-xid-age-and-min-mxid-age-options-to-vacuu.patch
On Fri, Jan 04, 2019 at 11:49:46PM +0000, Bossart, Nathan wrote: > 0004 introduces a slight change to existing behavior. Currently, if > you specify a missing table, vacuumdb will process each table until > it reaches the nonexistent one, at which point it will fail. After > 0004 is applied, vacuumdb will fail during the catalog query, and no > tables will be processed. I have not looked at the patch set in details, but that would make vacuumdb more consistent with the way VACUUM works with multiple relations, which sounds like a good thing. -- Michael
Attachment
On Fri, Jan 04, 2019 at 11:49:46PM +0000, Bossart, Nathan wrote: > Here's an updated set of patches with the following changes: > > - 0002 adds the parenthesized syntax for ANALYZE. > - 0003 adds DISABLE_PAGE_SKIPPING for VACUUM. > - 0003 also ensures SKIP_LOCKED is applied for --analyze-only. > - 0004 alters vacuumdb to always use the catalog query to discover > the list of tables to process. > - 0003, 0005, and 0006 now fail in vacuum_one_database() if a > specified option is not available on the server version. > - If tables are listed along with --min-xid-age, --min-mxid-age, or > --min-relation-size, each table is processed only if it satisfies > the provided options. I have been looking at the patch set, and 0001 can actually happen only once 0005 is applied because this modifies the query doing on HEAD a full scan of pg_class which would include at least catalog tables so it can never be empty. For this reason it seems to me that 0001 and 0004 should be merged together as common refactoring, making sure on the way that all relations exist before processing anything. As 0005 and 0006 change similar portions of the code, we could also merge these together in a second patch introducing the new options. Keeping 0001, 0004, 0005 and 0006 separated eases the review, but I would likely merge things when they make sense together to reduce diff chunks. 0002 removes some carefully-written query generation, so as it is never possible to generate something like ANALYZE FULL. HEAD splits ANALYZE and VACUUM clearly, but 0002 merges them together so as careless coding in vacuumdb.c makes it easier to generate command strings which would fail in the backend relying on the option checks to make sure that for example combining --full and --analyze-only never happens. Introducing 0002 is mainly here for 0003, so let's merge them together. > 0004 introduces a slight change to existing behavior. Currently, if > you specify a missing table, vacuumdb will process each table until > it reaches the nonexistent one, at which point it will fail. After > 0004 is applied, vacuumdb will fail during the catalog query, and no > tables will be processed. I considered avoiding this change in > behavior by doing an additional catalog lookup for each specified > table to see whether it satisfies --min-xid-age, etc. However, this > introduced quite a bit more complexity and increased the number of > catalog queries needed. Simplicity is always welcome, knowing that tables are missing before doing any operations is more consistent with plain VACUUM/ANALYZE. From patch 0004: + executeCommand(conn, "RESET search_path;", progname, echo); + res = executeQuery(conn, catalog_query.data, progname, echo); + termPQExpBuffer(&catalog_query); + PQclear(executeQuery(conn, ALWAYS_SECURE_SEARCH_PATH_SQL, + progname, echo)); We should really avoid that. There are other things like casts which tend to be forgotten, and if the catalog lookup query gets more complicated, I feel that this would again be forgotten, reintroducing the issues that ALWAYS_SECURE_SEARCH_PATH_SQL is here to fix. I have put my hands into what you sent, and worked on putting 0002/0003 first into shape, finishing with the attached. I have simplified the query construction a bit, making it IMO easier to read and to add new options, with comments documenting what's supported across versions. I have also added a regression test for --analyze-only --skip-locked. Then I have done some edit of the docs. What do you think for this portion? -- Michael
Attachment
On Sat, Jan 5, 2019 at 8:50 AM Bossart, Nathan <bossartn@amazon.com> wrote: > > On 12/21/18, 11:14 AM, "Bossart, Nathan" <bossartn@amazon.com> wrote: > > On 12/21/18, 10:51 AM, "Robert Haas" <robertmhaas@gmail.com> wrote: > >> On Thu, Dec 20, 2018 at 11:48 AM Bossart, Nathan <bossartn@amazon.com> wrote: > >>> Either way, we'll still have to decide whether to fail or to silently > >>> skip the option if you do something like specify --min-mxid-age for a > >>> 9.4 server. > >> > >> +1 for fail. > > > > Sounds good. I'll keep all the version checks and will fail for > > unsupported options in the next patch set. > > Here's an updated set of patches with the following changes: > > - 0002 adds the parenthesized syntax for ANALYZE. > - 0003 adds DISABLE_PAGE_SKIPPING for VACUUM. > - 0003 also ensures SKIP_LOCKED is applied for --analyze-only. > - 0004 alters vacuumdb to always use the catalog query to discover > the list of tables to process. > - 0003, 0005, and 0006 now fail in vacuum_one_database() if a > specified option is not available on the server version. > - If tables are listed along with --min-xid-age, --min-mxid-age, or > --min-relation-size, each table is processed only if it satisfies > the provided options. > > 0004 introduces a slight change to existing behavior. Currently, if > you specify a missing table, vacuumdb will process each table until > it reaches the nonexistent one, at which point it will fail. After > 0004 is applied, vacuumdb will fail during the catalog query, and no > tables will be processed. I considered avoiding this change in > behavior by doing an additional catalog lookup for each specified > table to see whether it satisfies --min-xid-age, etc. However, this > introduced quite a bit more complexity and increased the number of > catalog queries needed. > > Nathan > 0002 and 0003 are merged and posted by Michael-san and it looks good to me, so I've looked at the 0001, 0004, 0005 and 0006 patches. Here is a few review comments. ----- Even if all tables are filtered by --min-relation-size, min-mxid-age or min-xid-age the following message appeared. $ vacuumdb --verbose --min-relation-size 1000000 postgres vacuumdb: vacuuming database "postgres" Since no tables are processed in this case isn't is better not to output this message by moving the message after checking if ntup == 0? ----- You use pg_total_relation_size() to check the relation size but it might be better to use pg_relation_size() instead. The pg_total_relation_size() includes the all index size but I think it's common based on my experience that we check only the table size (including toast table) to do vacuum it or not. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Mon, Jan 07, 2019 at 06:10:21PM +0900, Masahiko Sawada wrote: > 0002 and 0003 are merged and posted by Michael-san and it looks good > to me, so I've looked at the 0001, 0004, 0005 and 0006 patches. Here > is a few review comments. I have done another round on 0002/0003 (PQfinish was lacking after error-ing on incompatible options) and committed the thing. > Even if all tables are filtered by --min-relation-size, min-mxid-age > or min-xid-age the following message appeared. > > $ vacuumdb --verbose --min-relation-size 1000000 postgres > vacuumdb: vacuuming database "postgres" > > Since no tables are processed in this case isn't is better not to > output this message by moving the message after checking if ntup == > 0? Hmm. My take on this one is that we attempt to vacuum relations on this database, but this may finish by doing nothing. Printing this message is still important to let the user know that an attempt was done so I would keep the message. > You use pg_total_relation_size() to check the relation size but it > might be better to use pg_relation_size() instead. The > pg_total_relation_size() includes the all index size but I think it's > common based on my experience that we check only the table size > (including toast table) to do vacuum it or not. Yes, I am also not completely sure if the way things are defined in the patch are completely what we are looking for. Still, I have seen as well people complain about the total amount of space a table is physically taking on disk, including its indexes. So from the user experience the direction taken by the patch makes sense, as much as does the argument you do. -- Michael
Attachment
On 1/7/19, 1:12 AM, "Michael Paquier" <michael@paquier.xyz> wrote: > I have been looking at the patch set, and 0001 can actually happen > only once 0005 is applied because this modifies the query doing on > HEAD a full scan of pg_class which would include at least catalog > tables so it can never be empty. For this reason it seems to me that > 0001 and 0004 should be merged together as common refactoring, making > sure on the way that all relations exist before processing anything. > As 0005 and 0006 change similar portions of the code, we could also > merge these together in a second patch introducing the new options. > Keeping 0001, 0004, 0005 and 0006 separated eases the review, but I > would likely merge things when they make sense together to reduce > diff chunks. Thanks for reviewing. Sure, I can merge these together so that it's just 2 patches. > 0002 removes some carefully-written query generation, so as it is > never possible to generate something like ANALYZE FULL. HEAD splits > ANALYZE and VACUUM clearly, but 0002 merges them together so as > careless coding in vacuumdb.c makes it easier to generate command > strings which would fail in the backend relying on the option checks > to make sure that for example combining --full and --analyze-only > never happens. Introducing 0002 is mainly here for 0003, so let's > merge them together. Makes sense. I was trying to simplify this code a bit, but I agree with your point about relying on the option checks. > From patch 0004: > + executeCommand(conn, "RESET search_path;", progname, echo); > + res = executeQuery(conn, catalog_query.data, progname, echo); > + termPQExpBuffer(&catalog_query); > + PQclear(executeQuery(conn, ALWAYS_SECURE_SEARCH_PATH_SQL, > + progname, echo)); > We should really avoid that. There are other things like casts which > tend to be forgotten, and if the catalog lookup query gets more > complicated, I feel that this would again be forgotten, reintroducing > the issues that ALWAYS_SECURE_SEARCH_PATH_SQL is here to fix. This was done in order to maintain the current behavior that appendQualifiedRelation() gives us. I found that skipping the search_path handling here forced us to specify the schema in the argument for --table in most cases. At the very least, I could add a comment here to highlight the importance of fully qualifying everything in the catalog query. What do you think? > I have put my hands into what you sent, and worked on putting > 0002/0003 first into shape, finishing with the attached. I have > simplified the query construction a bit, making it IMO easier to read > and to add new options, with comments documenting what's supported > across versions. I have also added a regression test for > --analyze-only --skip-locked. Then I have done some edit of the docs. > What do you think for this portion? Looks good to me, except for one small thing in the documentation: + <para> + Disable all page-skipping behavior during processing based on + the visibility map, similarly to the option + <literal>DISABLE_PAGE_SKIPPING</literal> for <command>VACUUM</command>. + </para> I think the "similarly to the option" part is slightly misleading. It's not just similar, it _is_ using that option in the generated commands. Perhaps we could point to the VACUUM documentation for more information about this one. On 1/7/19, 8:03 PM, "Michael Paquier" <michael@paquier.xyz> wrote: > On Mon, Jan 07, 2019 at 06:10:21PM +0900, Masahiko Sawada wrote: >> 0002 and 0003 are merged and posted by Michael-san and it looks good >> to me, so I've looked at the 0001, 0004, 0005 and 0006 patches. Here >> is a few review comments. > > I have done another round on 0002/0003 (PQfinish was lacking after > error-ing on incompatible options) and committed the thing. Thanks! >> Even if all tables are filtered by --min-relation-size, min-mxid-age >> or min-xid-age the following message appeared. >> >> $ vacuumdb --verbose --min-relation-size 1000000 postgres >> vacuumdb: vacuuming database "postgres" >> >> Since no tables are processed in this case isn't is better not to >> output this message by moving the message after checking if ntup == >> 0? > > Hmm. My take on this one is that we attempt to vacuum relations on > this database, but this may finish by doing nothing. Printing this > message is still important to let the user know that an attempt was > done so I would keep the message. +1 for keeping the message. >> You use pg_total_relation_size() to check the relation size but it >> might be better to use pg_relation_size() instead. The >> pg_total_relation_size() includes the all index size but I think it's >> common based on my experience that we check only the table size >> (including toast table) to do vacuum it or not. > > Yes, I am also not completely sure if the way things are defined in > the patch are completely what we are looking for. Still, I have seen > as well people complain about the total amount of space a table is > physically taking on disk, including its indexes. So from the user > experience the direction taken by the patch makes sense, as much as > does the argument you do. Good point. I think allowing multiple different relation size options here would be confusing, too (e.g. --min-relation-size versus --min-total-relation-size). IMO pg_total_relation_size() is the way to go here, as we'll most likely need to process the indexes and TOAST tables, too. If/when there is a DISABLE_INDEX_CLEANUP option for VACUUM, we'd then want to use pg_table_size() when --min-relation-size and --disable-index-cleanup are used together in vacuumdb. Thoughts? I've realized that I forgot to add links to the XID/MXID wraparound documentation like you suggested a while back. I'll make sure that gets included in the next patch set, too. Nathan
On Tue, Jan 08, 2019 at 06:46:11PM +0000, Bossart, Nathan wrote: > This was done in order to maintain the current behavior that > appendQualifiedRelation() gives us. I found that skipping the > search_path handling here forced us to specify the schema in the > argument for --table in most cases. At the very least, I could add a > comment here to highlight the importance of fully qualifying > everything in the catalog query. What do you think? A comment sounds like a good thing. And we really shouldn't hijack search_path even for one query... > Looks good to me, except for one small thing in the documentation: > > + <para> > + Disable all page-skipping behavior during processing based on > + the visibility map, similarly to the option > + <literal>DISABLE_PAGE_SKIPPING</literal> for <command>VACUUM</command>. > + </para> > > I think the "similarly to the option" part is slightly misleading. > It's not just similar, it _is_ using that option in the generated > commands. Perhaps we could point to the VACUUM documentation for more > information about this one. Sure. If you have any suggestions, please feel free. Adding a link to VACUUM documentation sounds good to me, as long as we avoid duplicating the description related to DISABLE_PAGE_SKIPPING on the VACUUM page. > Good point. I think allowing multiple different relation size options > here would be confusing, too (e.g. --min-relation-size versus > --min-total-relation-size). IMO pg_total_relation_size() is the way > to go here, as we'll most likely need to process the indexes and TOAST > tables, too. If/when there is a DISABLE_INDEX_CLEANUP option for > VACUUM, we'd then want to use pg_table_size() when --min-relation-size > and --disable-index-cleanup are used together in vacuumdb. > Thoughts? Yes, using pg_total_relation_size() looks like the best option to me here as well, still this does not make me 100% comfortable from the user perspective. > I've realized that I forgot to add links to the XID/MXID wraparound > documentation like you suggested a while back. I'll make sure that > gets included in the next patch set, too. Nice, thanks. -- Michael
Attachment
On Wed, Jan 9, 2019 at 10:06 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Tue, Jan 08, 2019 at 06:46:11PM +0000, Bossart, Nathan wrote: > > This was done in order to maintain the current behavior that > > appendQualifiedRelation() gives us. I found that skipping the > > search_path handling here forced us to specify the schema in the > > argument for --table in most cases. At the very least, I could add a > > comment here to highlight the importance of fully qualifying > > everything in the catalog query. What do you think? > > A comment sounds like a good thing. And we really shouldn't hijack > search_path even for one query... > > > Looks good to me, except for one small thing in the documentation: > > > > + <para> > > + Disable all page-skipping behavior during processing based on > > + the visibility map, similarly to the option > > + <literal>DISABLE_PAGE_SKIPPING</literal> for <command>VACUUM</command>. > > + </para> > > > > I think the "similarly to the option" part is slightly misleading. > > It's not just similar, it _is_ using that option in the generated > > commands. Perhaps we could point to the VACUUM documentation for more > > information about this one. > > Sure. If you have any suggestions, please feel free. Adding a link > to VACUUM documentation sounds good to me, as long as we avoid > duplicating the description related to DISABLE_PAGE_SKIPPING on the > VACUUM page. > > > Good point. I think allowing multiple different relation size options > > here would be confusing, too (e.g. --min-relation-size versus > > --min-total-relation-size). IMO pg_total_relation_size() is the way > > to go here, as we'll most likely need to process the indexes and TOAST > > tables, too. If/when there is a DISABLE_INDEX_CLEANUP option for > > VACUUM, we'd then want to use pg_table_size() when --min-relation-size > > and --disable-index-cleanup are used together in vacuumdb. > > Thoughts? > > Yes, using pg_total_relation_size() looks like the best option to me > here as well, still this does not make me 100% comfortable from the > user perspective. Agreed. Since pg_(total)_realtion_size() returns 0 for parent table the specifying the parent table to vacuumdb with --min-relation-size always does nothing. Maybe we will need to deal with this case when a function returning whole partitoned table size is introduced. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Wed, Jan 09, 2019 at 10:33:00AM +0900, Masahiko Sawada wrote: > Since pg_(total)_relation_size() returns 0 for parent table the > specifying the parent table to vacuumdb with --min-relation-size > always does nothing. Maybe we will need to deal with this case when a > function returning whole partitoned table size is introduced. Good point. I am not sure if we want to go down to having a size function dedicated to partitions especially as this would just now be a wrapper around pg_partition_tree(), but the size argument with partitioned tables is something to think about. If we cannot sort out this part cleanly, perhaps we could just focus on the age-ing parameters and the other ones first? It seems to me that what is proposed on this thread has value, so we could shave things and keep the essential, and focus on what we are sure about for simplicity. -- Michael
Attachment
On 1/8/19, 10:34 PM, "Michael Paquier" <michael@paquier.xyz> wrote: > On Wed, Jan 09, 2019 at 10:33:00AM +0900, Masahiko Sawada wrote: >> Since pg_(total)_relation_size() returns 0 for parent table the >> specifying the parent table to vacuumdb with --min-relation-size >> always does nothing. Maybe we will need to deal with this case when a >> function returning whole partitoned table size is introduced. > > Good point. I am not sure if we want to go down to having a size > function dedicated to partitions especially as this would just now be > a wrapper around pg_partition_tree(), but the size argument with > partitioned tables is something to think about. If we cannot sort out > this part cleanly, perhaps we could just focus on the age-ing > parameters and the other ones first? It seems to me that what is > proposed on this thread has value, so we could shave things and keep > the essential, and focus on what we are sure about for simplicity. Sounds good. I'll leave out --min-relation-size for now. Nathan
On Wed, Jan 9, 2019 at 1:33 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Wed, Jan 09, 2019 at 10:33:00AM +0900, Masahiko Sawada wrote: > > Since pg_(total)_relation_size() returns 0 for parent table the > > specifying the parent table to vacuumdb with --min-relation-size > > always does nothing. Maybe we will need to deal with this case when a > > function returning whole partitoned table size is introduced. > > Good point. I am not sure if we want to go down to having a size > function dedicated to partitions especially as this would just now be > a wrapper around pg_partition_tree(), but the size argument with > partitioned tables is something to think about. If we cannot sort out > this part cleanly, perhaps we could just focus on the age-ing > parameters and the other ones first? It seems to me that what is > proposed on this thread has value, so we could shave things and keep > the essential, and focus on what we are sure about for simplicity. Agreed. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Here's a new patch set that should address the feedback in this thread. The changes in this version include: - 0001 is a small fix to the 'vacuumdb --disable-page-skipping' documentation. My suggestion is to keep it short and simple like --full, --freeze, --skip-locked, --verbose, and --analyze. The DISABLE_PAGE_SKIPPING option is well-described in the VACUUM documentation, and IMO it is reasonably obvious that such vacuumdb options correspond to the VACUUM options. - v3-0002 is essentially v2-0001 and v2-0004 combined. I've also added a comment explaining the importance of fully qualifying the catalog query used to discover tables to process. - 0003 includes additional documentation changes explaining the main uses of --min-xid-age and --min-mxid-age and linking to the existing wraparound documentation. As discussed, I've left out the patch for --min-relation-size for now. Nathan
Attachment
On Mon, Jan 21, 2019 at 10:09:09PM +0000, Bossart, Nathan wrote: > Here's a new patch set that should address the feedback in this > thread. The changes in this version include: > > - 0001 is a small fix to the 'vacuumdb --disable-page-skipping' > documentation. My suggestion is to keep it short and simple like > --full, --freeze, --skip-locked, --verbose, and --analyze. The > DISABLE_PAGE_SKIPPING option is well-described in the VACUUM > documentation, and IMO it is reasonably obvious that such vacuumdb > options correspond to the VACUUM options. Okay, committed this one. > - v3-0002 is essentially v2-0001 and v2-0004 combined. I've also > added a comment explaining the importance of fully qualifying the > catalog query used to discover tables to process. Regarding the search_path business, there is actually similar business in expand_table_name_patterns() for pg_dump if you look close by, so as users calling --table don't have to specify the schema, and just stick with patterns. + /* + * Prepare the list of tables to process by querying the catalogs. + * + * Since we execute the constructed query with the default search_path + * (which could be unsafe), it is important to fully qualify everything. + */ It is not only important, but also absolutely mandatory, so let's make the comment insisting harder on that point. From this point of view, the query that you are building is visibly correct. + appendStringLiteralConn(&catalog_query, just_table, conn); + appendPQExpBuffer(&catalog_query, "::pg_catalog.regclass\n"); + + pg_free(just_table); + + cell = cell->next; + if (cell == NULL) + appendPQExpBuffer(&catalog_query, " )\n"); Hm. It seems to me that you are duplicating what processSQLNamePattern() does, so we ought to use it. And it is possible to control the generation of the different WHERE clauses with a single query based on the number of elements in the list. Perhaps I am missing something? It looks unnecessary to export split_table_columns_spec() as well. - qr/statement: ANALYZE;/, + qr/statement: ANALYZE pg_catalog\./, Or we could just use "ANALYZE \.;/", perhaps patching it first. Perhaps my regexp here is incorrect, but I just mean to check for an ANALYZE query which begins by "ANALYZE " and finishes with ";", without caring about what is in-between. This would make the tests more portable. > - 0003 includes additional documentation changes explaining the main > uses of --min-xid-age and --min-mxid-age and linking to the > existing wraparound documentation. +$node->issues_sql_like( + [ 'vacuumdb', '--table=pg_class', '--min-mxid-age=123456789', 'postgres'], + qr/AND GREATEST\(pg_catalog.mxid_age\(c.relminmxid\), pg_catalog.mxid_age\(t.relminmxid\)\) OPERATOR\(pg_catalog.>=\) 123456789/, + 'vacuumdb --table --min-mxid-age'); +$node->issues_sql_like( + [ 'vacuumdb', '--min-xid-age=987654321', 'postgres' ], + qr/AND GREATEST\(pg_catalog.age\(c.relfrozenxid\), pg_catalog.age\(t.relfrozenxid\)\) OPERATOR\(pg_catalog.>=\) 987654321/, + 'vacuumdb --min-xid-age'); It may be better to use numbers which make sure that no relations are actually fetched, so as if the surrounding tests are messed up we never make them longer than necessary just to check the shape of the query. -- Michael
Attachment
On 1/21/19, 10:08 PM, "Michael Paquier" <michael@paquier.xyz> wrote: > On Mon, Jan 21, 2019 at 10:09:09PM +0000, Bossart, Nathan wrote: >> Here's a new patch set that should address the feedback in this >> thread. The changes in this version include: >> >> - 0001 is a small fix to the 'vacuumdb --disable-page-skipping' >> documentation. My suggestion is to keep it short and simple like >> --full, --freeze, --skip-locked, --verbose, and --analyze. The >> DISABLE_PAGE_SKIPPING option is well-described in the VACUUM >> documentation, and IMO it is reasonably obvious that such vacuumdb >> options correspond to the VACUUM options. > > Okay, committed this one. Thanks. > Regarding the search_path business, there is actually similar business > in expand_table_name_patterns() for pg_dump if you look close by, so > as users calling --table don't have to specify the schema, and just > stick with patterns. I see. The "WHERE c.relkind..." clause looks a bit different than what I have, so I've altered vacuumdb's catalog query to match. This was changed in expand_table_name_patterns() as part of 582edc369cd for a security fix, so we should probably do something similar here. > + /* > + * Prepare the list of tables to process by querying the catalogs. > + * > + * Since we execute the constructed query with the default search_path > + * (which could be unsafe), it is important to fully qualify everything. > + */ > It is not only important, but also absolutely mandatory, so let's make > the comment insisting harder on that point. From this point of view, > the query that you are building is visibly correct. I've updated the comment as suggested. > + appendStringLiteralConn(&catalog_query, just_table, conn); > + appendPQExpBuffer(&catalog_query, "::pg_catalog.regclass\n"); > + > + pg_free(just_table); > + > + cell = cell->next; > + if (cell == NULL) > + appendPQExpBuffer(&catalog_query, " )\n"); > Hm. It seems to me that you are duplicating what > processSQLNamePattern() does, so we ought to use it. And it is > possible to control the generation of the different WHERE clauses with > a single query based on the number of elements in the list. Perhaps I > am missing something? It looks unnecessary to export > split_table_columns_spec() as well. I'm sorry, I don't quite follow this. AFAICT processSQLNamePattern() is for generating WHERE clauses based on a wildcard-pattern string for psql and pg_dump. This portion of the patch is generating a WHERE clause to filter only for the tables listed via --table, and I don't think the --table option currently accepts patterns. Since you can also provide a list of columns with the --table option, I am using split_table_columns_spec() to retrieve only the table name for the OID lookup. > - qr/statement: ANALYZE;/, > + qr/statement: ANALYZE pg_catalog\./, > Or we could just use "ANALYZE \.;/", perhaps patching it first. > Perhaps my regexp here is incorrect, but I just mean to check for an > ANALYZE query which begins by "ANALYZE " and finishes with ";", > without caring about what is in-between. This would make the tests > more portable. Sure, I've split this out into a separate patch. > +$node->issues_sql_like( > + [ 'vacuumdb', '--table=pg_class', '--min-mxid-age=123456789', > 'postgres'], > + qr/AND GREATEST\(pg_catalog.mxid_age\(c.relminmxid\), > pg_catalog.mxid_age\(t.relminmxid\)\) OPERATOR\(pg_catalog.>=\) 123456789/, > + 'vacuumdb --table --min-mxid-age'); > +$node->issues_sql_like( > + [ 'vacuumdb', '--min-xid-age=987654321', 'postgres' ], > + qr/AND GREATEST\(pg_catalog.age\(c.relfrozenxid\), > pg_catalog.age\(t.relfrozenxid\)\) OPERATOR\(pg_catalog.>=\) 987654321/, > + 'vacuumdb --min-xid-age'); > It may be better to use numbers which make sure that no relations are > actually fetched, so as if the surrounding tests are messed up we > never make them longer than necessary just to check the shape of the > query. I think it's already pretty unlikely that any matching relations would be found, but I've updated these values to be within the range where any matching database has already stopped accepting commands to prevent wraparound. Nathan
Attachment
On Tue, Jan 22, 2019 at 11:21:32PM +0000, Bossart, Nathan wrote: > On 1/21/19, 10:08 PM, "Michael Paquier" <michael@paquier.xyz> wrote: > > On Mon, Jan 21, 2019 at 10:09:09PM +0000, Bossart, Nathan wrote: > >> Here's a new patch set that should address the feedback in this > >> thread. The changes in this version include: > >> > >> - 0001 is a small fix to the 'vacuumdb --disable-page-skipping' > >> documentation. My suggestion is to keep it short and simple like > >> --full, --freeze, --skip-locked, --verbose, and --analyze. The > >> DISABLE_PAGE_SKIPPING option is well-described in the VACUUM > >> documentation, and IMO it is reasonably obvious that such vacuumdb > >> options correspond to the VACUUM options. > > > > Okay, committed this one. > > Thanks. > > > Regarding the search_path business, there is actually similar business > > in expand_table_name_patterns() for pg_dump if you look close by, so > > as users calling --table don't have to specify the schema, and just > > stick with patterns. > > I see. The "WHERE c.relkind..." clause looks a bit different than > what I have, so I've altered vacuumdb's catalog query to match. This > was changed in expand_table_name_patterns() as part of 582edc369cd for > a security fix, so we should probably do something similar here. > > > + /* > > + * Prepare the list of tables to process by querying the catalogs. > > + * > > + * Since we execute the constructed query with the default search_path > > + * (which could be unsafe), it is important to fully qualify everything. > > + */ > > It is not only important, but also absolutely mandatory, so let's make > > the comment insisting harder on that point. From this point of view, > > the query that you are building is visibly correct. > > I've updated the comment as suggested. > > > + appendStringLiteralConn(&catalog_query, just_table, conn); > > + appendPQExpBuffer(&catalog_query, "::pg_catalog.regclass\n"); > > + > > + pg_free(just_table); > > + > > + cell = cell->next; > > + if (cell == NULL) > > + appendPQExpBuffer(&catalog_query, " )\n"); > > Hm. It seems to me that you are duplicating what > > processSQLNamePattern() does, so we ought to use it. And it is > > possible to control the generation of the different WHERE clauses with > > a single query based on the number of elements in the list. Perhaps I > > am missing something? It looks unnecessary to export > > split_table_columns_spec() as well. > > I'm sorry, I don't quite follow this. AFAICT processSQLNamePattern() > is for generating WHERE clauses based on a wildcard-pattern string for > psql and pg_dump. This portion of the patch is generating a WHERE > clause to filter only for the tables listed via --table, and I don't > think the --table option currently accepts patterns. Since you can > also provide a list of columns with the --table option, I am using > split_table_columns_spec() to retrieve only the table name for the OID > lookup. Bah, of course you are right. vacuumdb does not support patterns but I thought it was able to handle that. With column names that would be sort of funny anyway. >> Or we could just use "ANALYZE \.;/", perhaps patching it first. >> Perhaps my regexp here is incorrect, but I just mean to check for an >> ANALYZE query which begins by "ANALYZE " and finishes with ";", >> without caring about what is in-between. This would make the tests >> more portable. > > Sure, I've split this out into a separate patch. Thanks, I have committed this part to ease the follow-up work. > I think it's already pretty unlikely that any matching relations would > be found, but I've updated these values to be within the range where > any matching database has already stopped accepting commands to > prevent wraparound. Thanks. I have been looking at 0002, and I found a problem with the way ANALYZE queries are generated. For example take this table: CREATE TABLE aa1 (a int); Then if I try to run ANALYZE with vacuumdb it just works: $ vacuumdb -z --table 'aa1(b)' vacuumdb: vacuuming database "ioltas" Note that this fails with HEAD, but passes with your patch. The problem is that the query generated misses the lists of columns when processing them through split_table_columns_spec(), as what is generated is that: VACUUM (ANALYZE) public.aa1; So the result is actually incorrect because all the columns get processed. This patch moves the check about the existence of the relation when querying the catalogs, perhaps we would want the same for columns for consistency? Or not. That's a bit harder for sure, not impossible visibly, still that would mean duplicating a bunch of logic that the backend is doing by itself, so we could live without it I think. Attached is a first patch to add more tests in this area, which passes on HEAD but fails with your patch. It looks like a good idea to tweak the tests first as there is no coverage yet for the queries generated with complete and partial column lists. At the same time, we may want to change split_table_columns_spec's name to be more consistent with the other APIs, like splitTableColumnspec. That's a nit though.. -- Michael
Attachment
On 1/22/19, 7:41 PM, "Michael Paquier" <michael@paquier.xyz> wrote: > I have been looking at 0002, and I found a problem with the way > ANALYZE queries are generated. For example take this table: > CREATE TABLE aa1 (a int); > > Then if I try to run ANALYZE with vacuumdb it just works: > $ vacuumdb -z --table 'aa1(b)' > vacuumdb: vacuuming database "ioltas" > > Note that this fails with HEAD, but passes with your patch. The > problem is that the query generated misses the lists of columns when > processing them through split_table_columns_spec(), as what is > generated is that: > VACUUM (ANALYZE) public.aa1; > > So the result is actually incorrect because all the columns get > processed. > > This patch moves the check about the existence of the relation when > querying the catalogs, perhaps we would want the same for columns for > consistency? Or not. That's a bit harder for sure, not impossible > visibly, still that would mean duplicating a bunch of logic that the > backend is doing by itself, so we could live without it I think. Oh, wow. Thanks for pointing this out. I should have caught this. With 0002, we are basically just throwing out the column lists entirely as we obtain the qualified identifiers from the catalog query. To fix this, I've added an optional CTE for tracking any provided column lists. v5-0001 is your test patch for this case, and v5-0002 splits out the work for split_table_columns_spec(). Nathan
Attachment
On Thu, Jan 24, 2019 at 12:49:28AM +0000, Bossart, Nathan wrote: > Oh, wow. Thanks for pointing this out. I should have caught this. > With 0002, we are basically just throwing out the column lists > entirely as we obtain the qualified identifiers from the catalog > query. To fix this, I've added an optional CTE for tracking any > provided column lists. v5-0001 is your test patch for this case, and > v5-0002 splits out the work for split_table_columns_spec(). Committed the test portion for now, still reviewing the rest.. -- Michael
Attachment
On Thu, Jan 24, 2019 at 12:49:28AM +0000, Bossart, Nathan wrote: > Oh, wow. Thanks for pointing this out. I should have caught this. > With 0002, we are basically just throwing out the column lists > entirely as we obtain the qualified identifiers from the catalog > query. To fix this, I've added an optional CTE for tracking any > provided column lists. v5-0001 is your test patch for this case, and > v5-0002 splits out the work for split_table_columns_spec(). I think that the query generation could be simplified by always using the CTE if column lists are present or not, by associating NULL if no column list is present, and by moving the regclass casting directly into the CTE. This way, for the following vacuumdb command with a set of tables wanted: vacuumdb --table aa --table 'bb(b)' --table 'cc(c)' Then the query generated looks like that: WITH column_lists (table_name, column_list) AS ( VALUES ('aa'::pg_catalog.regclass, NULL), ('bb'::pg_catalog.regclass, '(b)'), ('cc'::pg_catalog.regclass, '(c)') ) SELECT c.relname, ns.nspname, column_lists.column_list FROM pg_catalog.pg_class c JOIN pg_catalog.pg_namespace ns ON c.relnamespace OPERATOR(pg_catalog.=) ns.oid JOIN column_lists ON column_lists.table_name OPERATOR(pg_catalog.=) c.oid WHERE c.relkind OPERATOR(pg_catalog.=) ANY (array['r', 'm']) ORDER BY c.relpages DESC; So only the following parts are added: - The CTE with a table and its column list. - A join on pg_class.oid and column_lists.table_name. The latest version of the patch is doing a double amount of work by using a left join and an extra set of clauses in WHERE to fetch the matching column list from the table name entry. If no tables are listed, then we just finish with that: SELECT c.relname, ns.nspname FROM pg_catalog.pg_class c JOIN pg_catalog.pg_namespace ns ON c.relnamespace OPERATOR(pg_catalog.=) ns.oid WHERE c.relkind OPERATOR(pg_catalog.=) ANY (array['r', 'm']) ORDER BY c.relpages DESC; -- Michael
Attachment
On 1/27/19, 9:57 PM, "Michael Paquier" <michael@paquier.xyz> wrote: > I think that the query generation could be simplified by always using > the CTE if column lists are present or not, by associating NULL if no > column list is present, and by moving the regclass casting directly > into the CTE. Yes, this simplifies the code quite a bit. I did it this way in v6. Nathan
Attachment
On Mon, Jan 28, 2019 at 09:58:17PM +0000, Bossart, Nathan wrote: > Yes, this simplifies the code quite a bit. I did it this way in > v6. Thanks for the quick update. Things could have been made a bit more simple by just using a for loop instead of while, even if the table list can be NULL for database-wide operations (perhaps that's a matter of taste..). prepare_vacuum_command() could be simplified further by using the server version instead of the full connection string. The comments at the top of the routine were not properly updated either, and the last portion appending the relation name just needs one appendPQExpBuffer() call instead of three separate calls. PQclear() could have been moved closer to where all the query results have been consumed, to make the whole more consistent. Anyway, patches 1 and 2 have been merged, and committed after some cleanup and adjustments. Patch 3 gets much easier now. - " ON c.relnamespace OPERATOR(pg_catalog.=) ns.oid\n"); + " ON c.relnamespace OPERATOR(pg_catalog.=) ns.oid\n" + " LEFT JOIN pg_catalog.pg_class t" + " ON c.reltoastrelid OPERATOR(pg_catalog.=) t.oid\n"); Why do need this part? -- Michael
Attachment
On 1/28/19, 6:35 PM, "Michael Paquier" <michael@paquier.xyz> wrote: > Anyway, patches 1 and 2 have been merged, and committed after some > cleanup and adjustments. Patch 3 gets much easier now. Thanks! > - " ON c.relnamespace OPERATOR(pg_catalog.=) ns.oid\n"); > + " ON c.relnamespace OPERATOR(pg_catalog.=) ns.oid\n" > + " LEFT JOIN pg_catalog.pg_class t" > + " ON c.reltoastrelid OPERATOR(pg_catalog.=) t.oid\n"); > Why do need this part? This is modeled after the query provided in the docs for preventing transaction ID wraparound [0]. I think the idea is to combine the relation with its TOAST table so that it does not need to be considered separately. The VACUUM commands generated in vacuumdb will also process the corresponding TOAST table for the relation, anyway. I noticed a behavior change from the catalog query patch that we probably ought to fix. The "WHERE c.relkind IN ('r', 'm')" clause seems sufficient to collect all vacuumable relations (TOAST tables are handled when vacuuming the main relation, and partitioned tables are handled by vacuuming the partitions individually), but it is not sufficient to match the previous behavior when --table is used. Previously, we did not filter by relkind at all when --table is used. Instead, we let the server emit a WARNING when a relation that couldn't be processed was specified. Previous behavior: ~% vacuumdb -d postgres -t foreign_table vacuumdb: vacuuming database "postgres" WARNING: skipping "foreign_table" --- cannot vacuum non-tables or special system tables ~% vacuumdb -d postgres -t pg_toast.pg_toast_2600 --analyze-only vacuumdb: vacuuming database "postgres" WARNING: skipping "pg_toast_2600" --- cannot analyze non-tables or special system tables Current behavior: ~% vacuumdb -d postgres -t foreign_table vacuumdb: vacuuming database "postgres" ~% vacuumdb -d postgres -t pg_toast.pg_toast_2600 --analyze-only vacuumdb: vacuuming database "postgres" I think the simplest way to fix this is to remove the relkind clause altogether when --table is used and to let the server decide whether it should be processed. This effectively reinstates the previous behavior so that users can specify TOAST tables, partitioned tables, etc. Unfortunately, this complicates the --min-xid-age and --min-mxid-age patch a bit, as some of the relation types that can be vacuumed and/or analyzed do not really have a transaction ID age. AFAICT the simplest way to handle this case is to filter out relations with a relfrozenxid or relminmxid of 0. The v7 patch set implements these proposed approaches. Nathan [0] https://www.postgresql.org/docs/devel/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND
Attachment
On Tue, Jan 29, 2019 at 09:48:18PM +0000, Bossart, Nathan wrote: > On 1/28/19, 6:35 PM, "Michael Paquier" <michael@paquier.xyz> wrote: >> - " ON c.relnamespace OPERATOR(pg_catalog.=) ns.oid\n"); >> + " ON c.relnamespace OPERATOR(pg_catalog.=) ns.oid\n" >> + " LEFT JOIN pg_catalog.pg_class t" >> + " ON c.reltoastrelid OPERATOR(pg_catalog.=) t.oid\n"); >> Why do need this part? > > This is modeled after the query provided in the docs for preventing > transaction ID wraparound [0]. I think the idea is to combine the > relation with its TOAST table so that it does not need to be > considered separately. The VACUUM commands generated in vacuumdb will > also process the corresponding TOAST table for the relation, anyway. Oh, OK. This makes sense. It would be nice to add a comment in the patch and to document this calculation method in the docs of vacuumdb. > I noticed a behavior change from the catalog query patch that we > probably ought to fix. The "WHERE c.relkind IN ('r', 'm')" clause > seems sufficient to collect all vacuumable relations (TOAST tables are > handled when vacuuming the main relation, and partitioned tables are > handled by vacuuming the partitions individually), but it is not > sufficient to match the previous behavior when --table is used. > Previously, we did not filter by relkind at all when --table is used. > Instead, we let the server emit a WARNING when a relation that > couldn't be processed was specified. Indeed, the WARNING can be useful for some users when trying to work on an incorrect relation kind, especially when not using --verbose. Fixed after adding a test with command_checks_all. > Unfortunately, this complicates the --min-xid-age and --min-mxid-age > patch a bit, as some of the relation types that can be vacuumed and/or > analyzed do not really have a transaction ID age. AFAICT the simplest > way to handle this case is to filter out relations with a relfrozenxid > or relminmxid of 0. We should be able to live with that. -- Michael
Attachment
On 1/29/19, 4:47 PM, "Michael Paquier" <michael@paquier.xyz> wrote: > On Tue, Jan 29, 2019 at 09:48:18PM +0000, Bossart, Nathan wrote: >> On 1/28/19, 6:35 PM, "Michael Paquier" <michael@paquier.xyz> wrote: >>> - " ON c.relnamespace OPERATOR(pg_catalog.=) ns.oid\n"); >>> + " ON c.relnamespace OPERATOR(pg_catalog.=) ns.oid\n" >>> + " LEFT JOIN pg_catalog.pg_class t" >>> + " ON c.reltoastrelid OPERATOR(pg_catalog.=) t.oid\n"); >>> Why do need this part? >> >> This is modeled after the query provided in the docs for preventing >> transaction ID wraparound [0]. I think the idea is to combine the >> relation with its TOAST table so that it does not need to be >> considered separately. The VACUUM commands generated in vacuumdb will >> also process the corresponding TOAST table for the relation, anyway. > > Oh, OK. This makes sense. It would be nice to add a comment in the > patch and to document this calculation method in the docs of > vacuumdb. Sure, this is added in v8. >> I noticed a behavior change from the catalog query patch that we >> probably ought to fix. The "WHERE c.relkind IN ('r', 'm')" clause >> seems sufficient to collect all vacuumable relations (TOAST tables are >> handled when vacuuming the main relation, and partitioned tables are >> handled by vacuuming the partitions individually), but it is not >> sufficient to match the previous behavior when --table is used. >> Previously, we did not filter by relkind at all when --table is used. >> Instead, we let the server emit a WARNING when a relation that >> couldn't be processed was specified. > > Indeed, the WARNING can be useful for some users when trying to work > on an incorrect relation kind, especially when not using --verbose. > Fixed after adding a test with command_checks_all. Thanks. Something else I noticed is that we do not retrieve foreign tables and partitioned tables for --analyze and --analyze-only. However, that has long been the case for parallel mode, and this issue should probably get its own thread. Nathan
Attachment
On Wed, Jan 30, 2019 at 05:45:58PM +0000, Bossart, Nathan wrote: > On 1/29/19, 4:47 PM, "Michael Paquier" <michael@paquier.xyz> wrote: >> Oh, OK. This makes sense. It would be nice to add a comment in the >> patch and to document this calculation method in the docs of >> vacuumdb. > > Sure, this is added in v8. Thanks, Nathan. Something which was not correct in the patch is the compatibility of the query. xid <> xid has been added in 9.6, so the new options will not be able to work with older versions. The versions marked as compatible in the last patch came from the age-ing functions, but you added direct comparisons with relfrozenxid and relminmxid in the latest versions of the patch. This implementation goes down a couple of released versions, which is useful enough in my opinion, so I would keep it as-is. I have added as well some markups around "PostgreSQL" in the docs, and extra casts for the integer/xid values of the query. The test patterns are also simplified, and I added tests for incorrect values of --min-xid-age and --min-mxid-age. Does that look correct to you? > Thanks. Something else I noticed is that we do not retrieve foreign > tables and partitioned tables for --analyze and --analyze-only. > However, that has long been the case for parallel mode, and this issue > should probably get its own thread. Good point, this goes down a couple of releases, and statistics on both may be useful to compile for a system-wide operation. Spawning a separate thread looks adapted to me. -- Michael
Attachment
On 1/30/19, 6:04 PM, "Michael Paquier" <michael@paquier.xyz> wrote: > Something which was not correct in the patch is the compatibility of > the query. xid <> xid has been added in 9.6, so the new options will > not be able to work with older versions. The versions marked as > compatible in the last patch came from the age-ing functions, but you > added direct comparisons with relfrozenxid and relminmxid in the > latest versions of the patch. This implementation goes down a couple > of released versions, which is useful enough in my opinion, so I would > keep it as-is. Agreed. Thanks for catching this. > I have added as well some markups around "PostgreSQL" in the docs, and > extra casts for the integer/xid values of the query. The test > patterns are also simplified, and I added tests for incorrect values > of --min-xid-age and --min-mxid-age. Does that look correct to you? It looks good to me. The only thing I noticed is the use of relfrozenxid instead of relminmxid here: + appendPQExpBuffer(&catalog_query, + " %s GREATEST(pg_catalog.mxid_age(c.relminmxid)," + " pg_catalog.mxid_age(t.relminmxid)) OPERATOR(pg_catalog.>=)" + " '%d'::pg_catalog.int4\n" + " AND c.relfrozenxid OPERATOR(pg_catalog.!=)" + " '0'::pg_catalog.xid\n", + has_where ? "AND" : "WHERE", vacopts->min_mxid_age); However, that may still work as intended. Nathan
On Thu, Jan 31, 2019 at 02:28:05AM +0000, Bossart, Nathan wrote: > It looks good to me. The only thing I noticed is the use of > relfrozenxid instead of relminmxid here: Fixed and committed. So we are finally done here, except for the debate with the relation size. -- Michael
Attachment
On 1/30/19, 8:21 PM, "Michael Paquier" <michael@paquier.xyz> wrote: > Fixed and committed. So we are finally done here, except for the > debate with the relation size. Thanks for the diligent reviews, as always. I don't plan to pick up --min-relation-size right now, but I may attempt it again in a future commitfest. Nathan
On Thu, Jan 31, 2019 at 03:41:34PM +0000, Bossart, Nathan wrote: > Thanks for the diligent reviews, as always. I don't plan to pick up > --min-relation-size right now, but I may attempt it again in a future > commitfest. Sure, thanks for the patches! -- Michael