Thread: Thinking about EXPLAIN ALTER TABLE
I've been poking around with a feature I've wanted a number of times in the past, "EXPLAIN ALTER TABLE". The idea is that there are a bunch of optimizations in ALTER TABLE to minimize the amount of work and lock levels but it's really hard for users to tell whether they've written their ALTER TABLE commands carefully enough and properly to trigger the optimizations. As a result it's really easy for to accidentally take an exclusive lock and/or do a full table rewrite when you were expecting to just do a quick catalog update. The things I want to expose in ALTER TABLE are: 1. The lock level that's going to be taken 2. Whether a full table rewrite is going to happen 3. Whether a full table constraint validation is going to happen 4. Whether any indexes are going to be built or rebuilt 5. Whether the command is going to error out early due to syntax, permissions, or other inconsistencies Are there are other aspects of alter table that people would like to see exposed that I haven't thought of? For the most part ALTER TABLE is already structured such that this is pretty easy. It does a lot of preparatory work without doing catalog updates and I can just call that same preparatory work without calling the subsequent work phases. However there are a number of cases where decisions are made only during the actual work phase, phase 2, and flags are set and work enqueued for phase 3. In some cases the work that's enqueued would be hard to predict in advance, for example if a partition is added a new constraint is added for the partition but if that new constraint is merged with an existing constraint (which is handled by AddRelationNewConstraints()) then it doesn't need to be re-validated. I'm thinking I should try to move all these decisions to phase 1 as much as possible but I'm not sure how feasible it will be to get the results exactly correct. Of course the cases where it's hardest to predict are precisely where users would most like to know what's going to happen... If anyone has any ideas or tips on how to avoid these problems I'm all ears. Currently the output is a bit rough, it looks like: postgres=# explain alter table x2 add foreign key (i) references x1(i); ┌───────────────────────────────────┐ │ QUERY PLAN │ ├───────────────────────────────────┤ │ Lock Level: ShareRowExclusiveLock │ │ ALTER TABLE: x2 │ │ Relation: x2 │ │ Rewrite: none │ └───────────────────────────────────┘ postgres***=# explain alter table t add column j integer generated always as identity primary key; ┌─────────────────────────────────┐ │ QUERY PLAN │ ├─────────────────────────────────┤ │ Lock Level: AccessExclusiveLock │ │ CREATE SEQUENCE: t_j_seq │ │ ALTER TABLE: t │ │ Relation: t │ │ Rewrite: none │ │ ALTER SEQUENCE: t_j_seq │ └─────────────────────────────────┘ postgres***=# explain alter table t set unlogged; ┌─────────────────────────────────────┐ │ QUERY PLAN │ ├─────────────────────────────────────┤ │ Lock Level: AccessExclusiveLock │ │ ALTER TABLE: t │ │ Relation: t │ │ Rewrite: Due to ALTER PERSISTENCE │ └─────────────────────────────────────┘ postgres***=# explain alter table t alter column i set not null; ┌─────────────────────────────────┐ │ QUERY PLAN │ ├─────────────────────────────────┤ │ Lock Level: AccessExclusiveLock │ │ ALTER TABLE: t │ │ Relation: t │ │ Rewrite: none │ │ Relation: t2 │ │ Rewrite: none │ └─────────────────────────────────┘ -- greg
On Fri, Dec 7, 2018 at 1:18 PM Greg Stark <stark@mit.edu> wrote: > I've been poking around with a feature I've wanted a number of times > in the past, "EXPLAIN ALTER TABLE". The idea is that there are a bunch > of optimizations in ALTER TABLE to minimize the amount of work and > lock levels but it's really hard for users to tell whether they've > written their ALTER TABLE commands carefully enough and properly to > trigger the optimizations. As a result it's really easy for to > accidentally take an exclusive lock and/or do a full table rewrite > when you were expecting to just do a quick catalog update. Seems like an important project. > Are there are other aspects of alter table that people would like to > see exposed that I haven't thought of? How about: * Whether CLUSTER does a sort, or a full index scan. * How many workers will be used by CREATE INDEX, if any. (These are both examples of utility statements that make very limited use of the optimizer -- CREATE TABLE AS SELECT is covered by explain already, and is closer to a DML statement.) * Whether an external sort will be used by CREATE INDEX or CLUSTER, plus other details predicted by cost_sort(). These additional items are just nice-to-haves that I'm throwing out as suggestions. > I'm thinking I should try to move all these decisions to phase 1 as > much as possible but I'm not sure how feasible it will be to get the > results exactly correct. Of course the cases where it's hardest to > predict are precisely where users would most like to know what's going > to happen... I'm not sure either, but I suspect that some compromise will be required. For example, it's not at all clear whether or not a lock will be acquired on TOAST tables in many instances. Also, the planner itself can acquire ASLs on indexes in order to think about using them, without execution going on to make use of them, which can matter (e.g. there could be a REINDEX of an index that the planner isn't going to use, that nonetheless necessitates that the planner wait). What you're talking about here isn't really "EXPLAIN, but for DDL" so much as a way of making EXPLAIN predict relation lock acquisitions, including for DDL statements. I'm fine with calling that EXPLAIN, but it's still not quite EXPLAIN as we know it today. -- Peter Geoghegan
On 12/7/18, Greg Stark <stark@mit.edu> wrote: > I've been poking around with a feature I've wanted a number of times > in the past, "EXPLAIN ALTER TABLE". I believe I've seen your messages to that effect in the archives, so I've had it in the back of my mind as well. I think it could be very useful. > 3. Whether a full table constraint validation is going to happen Is it possible that this can occur via index-only scan and that this feature could know that? > For the most part ALTER TABLE is already structured such that this is > pretty easy. It does a lot of preparatory work without doing catalog > updates and I can just call that same preparatory work without calling > the subsequent work phases. One thing that came to mind: Since the input is not a plan tree, it seems it would be more difficult to keep EXPLAIN in sync with additional ALTER TABLE features. Do you have any thoughts about that? > postgres***=# explain alter table t alter column i set not null; > ┌─────────────────────────────────┐ > │ QUERY PLAN │ > ├─────────────────────────────────┤ > │ Lock Level: AccessExclusiveLock │ > │ ALTER TABLE: t │ > │ Relation: t │ > │ Rewrite: none │ > │ Relation: t2 │ > │ Rewrite: none │ > └─────────────────────────────────┘ In this example, I assume the two relations are partitions? With many partitions, this could get unwieldy pretty fast. I imagine there could be a shorthand notation. -John Naylor
Hi Greg On 2018-Dec-07, Greg Stark wrote: > I'm thinking I should try to move all these decisions to phase 1 as > much as possible but I'm not sure how feasible it will be to get the > results exactly correct. Of course the cases where it's hardest to > predict are precisely where users would most like to know what's going > to happen... Maybe you can move some of these decisions to phase 1, but I'm not sure it can be done for all of them. Another possible plan is to add a flag "dry run" so that phases 2/3 do whatever analysis they need to report for your EXPLAIN, but not actually carry out their tasks. (I see two options to implement this, one is a global flag and the other is a new argument to all those routines.) > postgres***=# explain alter table t set unlogged; > ┌─────────────────────────────────────┐ > │ QUERY PLAN │ > ├─────────────────────────────────────┤ > │ Lock Level: AccessExclusiveLock │ > │ ALTER TABLE: t │ > │ Relation: t │ > │ Rewrite: Due to ALTER PERSISTENCE │ > └─────────────────────────────────────┘ Note there's a relation scan that doesn't rewrite (to verify constraints IIRC). That's certainly worth reporting in some form. Maybe instead of "Rewrite:" use something like "Scan: read-only / rewrite due to ALTER PERSISTENCE". But ... not sure what you propose to print when a table rewrite is caused by two subcommands, say change persistence at the same time as a column datatype. And what if you add a new constraint together with those two? > postgres***=# explain alter table t alter column i set not null; > ┌─────────────────────────────────┐ > │ QUERY PLAN │ > ├─────────────────────────────────┤ > │ Lock Level: AccessExclusiveLock │ > │ ALTER TABLE: t │ > │ Relation: t │ > │ Rewrite: none │ > │ Relation: t2 │ > │ Rewrite: none │ > └─────────────────────────────────┘ I think putting the "Rewrite:" at the same indentation level as the relation that it qualifies is confusing. I'd do it this way: ┌─────────────────────────────────┐ │ QUERY PLAN │ ├─────────────────────────────────┤ │ Lock Level: AccessExclusiveLock │ │ ALTER TABLE: t │ │ Relation: t │ │ Rewrite: none │ │ Relation: t2 │ │ Rewrite: none │ └─────────────────────────────────┘ Maybe make the output some legible form of YAML or JSON? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, 10 Dec 2018 at 16:14, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
--
Hi Greg
On 2018-Dec-07, Greg Stark wrote:
> I'm thinking I should try to move all these decisions to phase 1 as
> much as possible but I'm not sure how feasible it will be to get the
> results exactly correct. Of course the cases where it's hardest to
> predict are precisely where users would most like to know what's going
> to happen...
Maybe you can move some of these decisions to phase 1, but I'm not sure
it can be done for all of them. Another possible plan is to add a flag
"dry run" so that phases 2/3 do whatever analysis they need to report
for your EXPLAIN, but not actually carry out their tasks. (I see two
options to implement this, one is a global flag and the other is a new
argument to all those routines.)
You need to take a table lock to find out things about the table.
EXPLAIN seems like the wrong place for this.
I suggest ALTER TABLE should respond to a parameter setting of ddl_dry_run = on, so the whole world doesn't need to rewrite its syntax to support the new option.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Simon Riggs <simon@2ndquadrant.com> writes: > I suggest ALTER TABLE should respond to a parameter setting of ddl_dry_run > = on, so the whole world doesn't need to rewrite its syntax to support the > new option. We were just busy shooting down a different suggestion of behavior-changing GUCs. A GUC that turns all ALTERs into no-ops sure seems like a foot-gun to me. regards, tom lane
On Mon, 10 Dec 2018 at 16:32, Tom Lane <tgl@sss.pgh.pa.us> wrote:
--
Simon Riggs <simon@2ndquadrant.com> writes:
> I suggest ALTER TABLE should respond to a parameter setting of ddl_dry_run
> = on, so the whole world doesn't need to rewrite its syntax to support the
> new option.
We were just busy shooting down a different suggestion of
behavior-changing GUCs. A GUC that turns all ALTERs into no-ops
sure seems like a foot-gun to me.
How would you test a script? Manually edit each one with the new option? Manual editing is more of a foot gun.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Simon Riggs <simon@2ndquadrant.com> writes: > On Mon, 10 Dec 2018 at 16:32, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> We were just busy shooting down a different suggestion of >> behavior-changing GUCs. A GUC that turns all ALTERs into no-ops >> sure seems like a foot-gun to me. > How would you test a script? Manually edit each one with the new option? > Manual editing is more of a foot gun. I don't see how EXPLAIN ALTER TABLE would meaningfully be something you use in a script. If you have a script with many steps including ALTER TABLEs, it's likely that each ALTER depends on the effects of prior steps, and it's even more likely that the steps after an ALTER depend on it having executed. regards, tom lane
On Mon, 10 Dec 2018 at 16:52, Tom Lane <tgl@sss.pgh.pa.us> wrote:
--
Simon Riggs <simon@2ndquadrant.com> writes:
> On Mon, 10 Dec 2018 at 16:32, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> We were just busy shooting down a different suggestion of
>> behavior-changing GUCs. A GUC that turns all ALTERs into no-ops
>> sure seems like a foot-gun to me.
> How would you test a script? Manually edit each one with the new option?
> Manual editing is more of a foot gun.
I don't see how EXPLAIN ALTER TABLE would meaningfully be something
you use in a script. If you have a script with many steps including
ALTER TABLEs, it's likely that each ALTER depends on the effects of
prior steps, and it's even more likely that the steps after an ALTER
depend on it having executed.
Agreed, but that's why I suggested an alternative.
Remembering, the best approach is one we have already taken.... but maybe we have forgotten.
An event trigger with a table_rewrite event, allows you to scan a whole script for objectionable activity on a test server before you put it into production.
Perhaps we just need a few extra events.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, 10 Dec 2018 at 12:27, Simon Riggs <simon@2ndquadrant.com> wrote: > > An event trigger with a table_rewrite event, allows you to scan a whole script for objectionable activity on a test serverbefore you put it into production. > > Perhaps we just need a few extra events. That's not a bad idea at all. Offhand the missing events that come to mind are: 1. Validation passes 1. Table Rebuild pass 1. Index builds 1. Locks that block DDL (currently that's AccessExclusiveLock and ShareRowExclusiveLock) I see this as complementary to EXPLAIN ALTER TABLE rather than replacing it. EXPLAIN would help the developer understand what their commands were doing and why during development and the event triggers would serve largely as safety backstop in staging and production to ensure nothing slips through. -- greg
On Tue, Dec 11, 2018 at 1:32 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > I suggest ALTER TABLE should respond to a parameter setting of ddl_dry_run > > = on, so the whole world doesn't need to rewrite its syntax to support the > > new option. > > We were just busy shooting down a different suggestion of > behavior-changing GUCs. A GUC that turns all ALTERs into no-ops > sure seems like a foot-gun to me. Yeah, I like EXPLAIN better. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Dec 11, 2018 at 11:25:12AM +0900, Robert Haas wrote: > Yeah, I like EXPLAIN better. +1. -- Michael
Attachment
On Tue, Dec 11, 2018 at 11:25:12AM +0900, Robert Haas wrote: > On Tue, Dec 11, 2018 at 1:32 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Simon Riggs <simon@2ndquadrant.com> writes: > > > I suggest ALTER TABLE should respond to a parameter setting of ddl_dry_run > > > = on, so the whole world doesn't need to rewrite its syntax to support the > > > new option. > > > > We were just busy shooting down a different suggestion of > > behavior-changing GUCs. A GUC that turns all ALTERs into no-ops > > sure seems like a foot-gun to me. > > Yeah, I like EXPLAIN better. +1 for EXPLAIN Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
>>> We were just busy shooting down a different suggestion of >>> behavior-changing GUCs. A GUC that turns all ALTERs into no-ops >>> sure seems like a foot-gun to me. >> Yeah, I like EXPLAIN better. > +1 for EXPLAIN IMVHO, and for "symmetry" with existing mechanisms: * EXPLAIN ALTER TABLE ==> "DDL dry run", but tell me what would be done (similar to what EXPLAIN SELECT does) * EXPLAIN PERFORM ALTER TABLE (EXPLAIN EXEC?) would explain + do ...and bonus points for explaining each step just before it is performed. This way, It'd be easy for users to verify that a particular step (i.e. table rewrite) is the one taking æons to run or hammering the storage. Of course, regular "ALTER TABLE" stays as it is. Just my .02€ :) I'm not familiar with this part of the code and clearly can't devote enough time to do it in the near future, but count on me to test it if accepted. Thanks, Jose
On 2018-Dec-11, Jose Luis Tallon wrote: > * EXPLAIN PERFORM ALTER TABLE (EXPLAIN EXEC?) would explain + do > > ...and bonus points for explaining each step just before it is > performed. This way, It'd be easy for users to verify that a particular step > (i.e. table rewrite) is the one taking æons to run or hammering the storage. We have a mechanism for reporting progress of DDL commands. Currently as far as I remember it's able to report VACUUM progress; there's a patch in commitfest to add support for CLUSTER. Rahila Syed in https://www.pgcon.org/2017/schedule/attachments/472_Progress%20Measurement%20PostgreSQL.pdf proposed ways to have it report for ALTER TABLE also, though I don't think anyone has worked on that. I think these two things are separate. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, 10 Dec 2018 at 11:14, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > Hi Greg > > On 2018-Dec-07, Greg Stark wrote: > > > I'm thinking I should try to move all these decisions to phase 1 as > > much as possible but I'm not sure how feasible it will be to get the > > results exactly correct. Of course the cases where it's hardest to > > predict are precisely where users would most like to know what's going > > to happen... > > Maybe you can move some of these decisions to phase 1, but I'm not sure > it can be done for all of them. Indeed I've done some work on this and it will be impossible to get it 100% of the way there. At least not with a reasonable amount of work. Personally I think it can be close enough to be useful but others may have a different opinion... A typically example, and I think the main realistic source of problems, is the partitioning bound constraint checks. They're elided if the new bound can be proven from the existing constraints. But that requires recursing through all the subpartitions which currently only happens in the final phase. Worse, I think -- though I'm not 100% clear on this -- that these constraint proofs can't even be done earlier in the process because the constraints can refer to column names that don't necessarily exist until earlier catalog changes are done. Personally I'm fine saying here's the plan -- some steps may be optimized away at run-time but the locking and table rewrites described should be correct and most of the time the constraint validations should be correct too. > Another possible plan is to add a flag > "dry run" so that phases 2/3 do whatever analysis they need to report > for your EXPLAIN, but not actually carry out their tasks. I don't think this helps any. Anything that could be done in a dry run could just be duplicated work earlier. The bits that cannot are the bits that depend on catalog changes that are done in the same command. The ony way they could be done in a "dry run" would be doing the whole operation in a not-dry-run in a transaction and then rolling it back. But that would produce a lot of production impact that would defeat the whole purpose of having the command at all. -- greg