Home > mailing lists

Re: pg_dump and dependencies and --section ... it's a mess - Mailing list pgsql-hackers

From	Andrew Dunstan
Subject	Re: pg_dump and dependencies and --section ... it's a mess
Date	June 21, 2012 19:19:00
Msg-id	4FE39DB3.1010005@dunslane.net Whole thread Raw
In response to	pg_dump and dependencies and --section ... it's a mess (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: pg_dump and dependencies and --section ... it's a mess
List	pgsql-hackers

Tree view


On 06/21/2012 02:13 PM, Tom Lane wrote:
> Don't know if everybody on this list has been paying attention to the
> pgsql-bugs thread about bug #6699.  The shortest example of the problem
> is
>
> create table t1 (f1 int primary key, f2 text);
> create view v1 as select f1, f2 from t1 group by f1;
>
> The view's query is legal only because of the primary key on f1,
> else the reference to f2 would be considered inadequately grouped.
> So when we create the view we mark it as dependent on that primary key
> (or more accurately, the view's _RETURN rule is so marked).  This all
> works fine so far as the backend is concerned: you can't drop the PK
> without dropping or redefining the view.  But it gives pg_dump
> indigestion.  If you do "pg_dump -Fc | pg_restore -l -v" you see
> (relevant entries only):
>
> 5; 2615 2200 SCHEMA - public postgres            PRE_DATA
> 168; 1259 41968 TABLE public t1 postgres        PRE_DATA
> ;       depends on: 5
> 1921; 2606 41975 CONSTRAINT public t1_pkey postgres    POST_DATA
> ;       depends on: 168 168
> 169; 1259 41976 VIEW public v1 postgres            PRE_DATA
> ;       depends on: 1919 5
> 1922; 0 41968 TABLE DATA public t1 postgres        DATA
> ;       depends on: 168
>
> Now, there are two not-nice things about this.  First, the view is shown
> as depending on "object 1919", which is pg_dump's internal DumpId for
> the view's _RETURN rule --- but that's nowhere to be seen in the dump,
> so the fact that it's dependent on the t1_pkey constraint is not
> visible in the dump.  This puts parallel pg_restore at risk of doing
> the wrong thing.  Second, because view definitions are preferentially
> dumped in the PRE_DATA section, the t1_pkey constraint has been hoisted
> up to come before the table data.  That means that in an ordinary serial
> restore, the index will be created before the table's data is loaded,
> which is undesirable for efficiency reasons.
>
> It gets worse though.  I've labeled the above archive items with their
> "SECTION" codes (which for some reason pg_restore -l -v doesn't print)
> so that you can see that we've got a POST_DATA item that must come
> before a PRE_DATA item.  This wreaks havoc with the entire concept
> of splitting dump files into three sections.  When I first realized
> that, I was thinking that we would have to revert the --section flag
> we added to pg_dump/pg_restore in 9.2, for lack of any way to guarantee
> that the items can actually be restored if they are split according
> to sections.
>
> I think that we can fix it though.  There are provisions in pg_dump
> already for splitting a view apart from its _RETURN rule --- and if the
> rule comes out as a separate object, it's POST_DATA.  So at least in
> principle, we could fix things by dumping this scenario this way:
>
> SCHEMA public        PRE_DATA
> TABLE t1        PRE_DATA
> TABLE v1        PRE_DATA (at this point it's a table not a view)
> TABLE DATA t1        DATA
> CONSTRAINT t1_pkey    POST_DATA
> RULE for v1        POST_DATA (adding the rule turns v1 into a view)
>
> The problem is how to persuade pg_dump to do that; as the code stands,
> it will only break a view apart from its rule if that's necessary to
> break a circular dependency loop, and there is none in this example.
>
> Another point worth thinking about is that if --section is to be trusted
> at all, we need some way of guaranteeing that the dependency-sorting
> code won't happily create other similar cases.  There is nothing in
> there right now that would think anything is wrong with an ordering
> that breaks the section division.
>
> I believe the right fix for both of these issues is to add knowledge of
> the section concept to the topological sort logic, so that an ordering
> that puts POST_DATA before DATA or PRE_DATA after DATA is considered to
> be a dependency-ordering violation.  One way to do that is to add dummy
> "fencepost" objects to the sort, representing the start and end of the
> DATA section.  However, these objects would need explicit dependency
> links to every other DumpableObject, so that doesn't sound very good
> from a performance standpoint.  What I'm going to go look at is whether
> we can mark DumpableObjects with their SECTION codes at creation time
> (rather than adding that information at ArchiveEntry() time) and then
> have the topo sort logic take that marking into account in addition to
> the explicit dependency links.
>
> Assuming we can make that work, how far should it be back-patched?
> The --section issue is new in 9.2, but this would also take care of
> the restore problems for views with constraint dependencies, which
> are allowed as of 9.1, so I'm thinking we'd better put it into 9.1.
>
> The other problem is the bogus dependency IDs that pg_dump emits by
> virtue of not caring whether the dependency links to an object it's
> actually dumped.  I posted a patch for that in the pgsql-bugs thread
> but pointed out that it would not work before 9.2 without additional
> surgery.  If we fix the view vs constraint issue by adding section
> knowledge to the sort, I think we can maybe get away with not fixing
> the dependency IDs in back branches.  parallel pg_restore is definitely
> at risk without the fix, but in the absence of any other known problem
> cases I'm inclined to not change the back branches more than we have to.
>
> Thoughts?
>
>             


Wow, talk about subtle. When I first read this my face turned a bit red, 
but after reading it a couple more times I don't feel quite so guilty, 
especially since the constraint dependency didn't exist at the time we 
did parallel pg_restore.

If I'm understanding you correctly, fixing the bogus dependency thing is 
more an insurance policy than fixing a case (other than the constraint 
dependency) that is known to be broken.  If that is correct, and given 
that we've had 3 years of parallel pg_restore and not heard of such 
issues, it doesn't seem unreasonable not to backpatch.

(There's another bug to do with parallel pg_restore and clustering that 
Andrew Hammond raised back in January, that I want to fix when I get 
some time.)

cheers

andrew

pgsql-hackers by date:

From: Simon Riggs
Date: 21 June 2012, 18:54:18
Subject: Re: [v9.3] Extra Daemons (Re: elegant and effective way for running jobs inside a database)

From: Peter Geoghegan
Date: 21 June 2012, 19:24:24
Subject: Re: [PATCH 04/16] Add embedded list interface (header only)

Re: pg_dump and dependencies and --section ... it's a mess - Mailing list pgsql-hackers

Previous

Next