Thread: pg_dump behaves differently for different archive formats
Restoring a "plain format" dump and a "custom format" dump of the same database can lead to different results: pg_dump organizes the SQL statements it creates in "TOC entries". If a custom format dump is restored with pg_restore, all SQL statements in a TOC entry will be executed as a single command and thus in a single transaction. On the other hand, each SQL statement in a plain format dump is executed individually in its own transaction, and TOC entries are irrelevant (except as comments for documentation). E.g., if a table has ACL entries for several roles and one of them is not present in the destination database, a plain format dump will restore all privileges except the ones that pertain to the missing user, while a custom format dump will not restore any privileges even for existing users. This is because all ACL related statements are in one TOC entry. Another example is a table that you try to restore into a database where the original table owner does not exist. With a plain format dump, the table is created, but will belong to the user restoring the dump, while a custom format dump will not create the table at all. This is because CREATE TABLE and ALTER TABLE ... OWNER TO are in the same TOC entry. One can argue for or against each individual behaviour, but I am surprised by the difference. Is there a deeper reason why it should remain like this or should I consider it a bug that should get fixed? Yours, Laurenz Albe
Albe Laurenz <laurenz.albe@wien.gv.at> writes: > Restoring a "plain format" dump and a "custom format" dump of > the same database can lead to different results: > pg_dump organizes the SQL statements it creates in "TOC entries". > If a custom format dump is restored with pg_restore, all > SQL statements in a TOC entry will be executed as a single command > and thus in a single transaction. Yeah, this is a bug I think. pg_dump was designed around the idea that the output would be executed as a simple script, and in a number of places there's an expectation that one SQL statement can fail without affecting following ones. So if pg_restore can't provide that behavior it's not good. On the other hand, I'm not sure how much enthusiasm there'd be for complex or fragile changes to fix this. A lot of people invariably run restores in single-transaction mode and don't really care about fault-tolerant restores. Also, it's easy enough to dodge the problem if you must: just pipe the output into psql rather than direct-to-database. So to me the question is can we fix this without doing something like duplicating psql's lexer? If we have to parse out the statements contained in each text blob, it's probably going to be too painful. Some cautionary history about this sort of thing can be read at http://www.postgresql.org/message-id/flat/18006.1325700782@sss.pgh.pa.us regards, tom lane