Thread: UNION and pg_restore
Greetings, list,
I have a VIEW I use to combine both acquired and missing field observations. Because of they way these data will be used, missing values can't be NULL. So I make them some outrageous integer value, like -999. I put the full VIEW together using UNION.
As shown in the test case below, CREATE VIEW slightly changes the appearance of the query it stores as my VIEW definition. Now, "?column?" appears as a column reference in one of the SELECTs to be UNIONed. This did not present a problem until I tried to restore the database from a pg_dump custom-format file. Then I got the following kind of error (this one comes from trying to restore the test data given below):
pg_restore: [archiver (db)] Error from TOC entry 164; 1259 47618 VIEW view_1 test_userpg_restore: [archiver (db)] could not execute query: ERROR: column reference "?column?" is ambiguousLINE 2: ...2 FROM tab_1 UNION SELECT a.id_1, a.id_2, a.id_3, b."?column...^Command was: CREATE VIEW view_1 ASSELECT tab_1.id_1, tab_1.id_2, tab_1.id_3, tab_1.data_1, tab_1.data_2 FROM tab_1 UNION SELECT a.id...
My question is, then, how is it that the query embodied in "view_1" below executes fine, but cannot seem to be restored? Is this telling me my query is dumb? If so, any advice on how to easily derive "view_1" from "tab_1" and "tab_2" below, without baffling pg_restore, would be welcome.
Thanks,
Bryan
===================================
The following test case mirrors my own:
CREATE TABLE tab_1 ( -- Acquired observations.
id_1 int,
id_2 int,
id_3 int,
data_1 int,
data_2 int
);
INSERT INTO tab_1
VALUES (1,2,3,4,5);
CREATE TABLE tab_2 ( -- Missing observations.
id_1 int,
id_2 int,
id_3 int
);
INSERT INTO tab_2
VALUES (6,7,8), (7,8,9);
CREATE OR REPLACE VIEW view_1 AS (
SELECT * FROM tab_1
UNION
SELECT * FROM
(SELECT * FROM tab_2) a
CROSS JOIN
(SELECT -999,-999) b
);
================================================
testing=> SELECT * FROM view_1 ORDER BY id_1;
id_1 | id_2 | id_3 | data_1 | data_2------+------+------+--------+--------1 | 2 | 3 | 4 | 56 | 7 | 8 | -999 | -9997 | 8 | 9 | -999 | -999(3 rows)
testing=> \d+ view_1
View "test.view_1"Column | Type | Modifiers | Storage | Description--------+---------+-----------+---------+-------------id_1 | integer | | plain |id_2 | integer | | plain |id_3 | integer | | plain |data_1 | integer | | plain |data_2 | integer | | plain |View definition:SELECT tab_1.id_1, tab_1.id_2, tab_1.id_3, tab_1.data_1, tab_1.data_2FROM tab_1UNIONSELECT a.id_1, a.id_2, a.id_3, b."?column?" AS data_1, b."?column?" AS data_2FROM ( SELECT tab_2.id_1, tab_2.id_2, tab_2.id_3FROM tab_2) aCROSS JOIN ( SELECT (-999), (-999)) b;
testing=> SELECT version();
version
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------PostgreSQL 9.1.6 on x86_64-apple-darwin11.4.0, compiled by i686-apple-darwin11-llvm-gcc-4.2 (GCC) 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.11.00), 64-bit(1 row)
================================================
Bryan Lee Nuse <nuse@uga.edu> writes: > My question is, then, how is it that the query embodied in "view_1" below executes fine, but cannot seem to be restored? Is this telling me my query is dumb? If so, any advice on how to easily derive "view_1" from "tab_1" and "tab_2"below, without baffling pg_restore, would be welcome. The core issue is that you've got SELECT * FROM (SELECT (-999), (-999)) b; If you execute this on its own, you'll get ?column? | ?column? ----------+---------- -999 | -999 (1 row) that is, both columns have been assigned the arbitrary name "?column?". That's okay in itself, because the "*" is expanded positionally and doesn't care whether the columns have distinct names or not. However, when the view definition is dumped out, you have > View definition: > SELECT tab_1.id_1, tab_1.id_2, tab_1.id_3, tab_1.data_1, tab_1.data_2 > FROM tab_1 > UNION > SELECT a.id_1, a.id_2, a.id_3, b."?column?" AS data_1, b."?column?" AS data_2 > FROM ( SELECT tab_2.id_1, tab_2.id_2, tab_2.id_3 > FROM tab_2) a > CROSS JOIN ( SELECT (-999), (-999)) b; and now there are ambiguous references to b."?column?" in there. The short answer to this is that you ought to take some care that the columns of the sub-select have distinct names. You could do it in at least two ways: (SELECT (-999), (-999)) b(x,y) (SELECT (-999) AS x, (-999) AS y) b Now you're doubtless wondering why Postgres doesn't dodge this ambiguity for you. One way would be to dump out the view using the "*" notation, but it turns out that that is contrary to SQL standard: "*" must be expanded at parse time. (The place in the standard that mandates this is a little note under ALTER TABLE ADD COLUMN that says that adding a column doesn't affect the meaning of any previously defined views. Hence "SELECT * FROM foo" has to be interpreted as expanding the "*" immediately, lest its meaning change due to subsequent addition of a column to foo.) The other thing we might do is to generate distinct names for unnamed columns. That would probably be a good idea and maybe we'll do it sooner or later, but it's not required by (recent versions of) the SQL standard. It would in any case not eliminate the risk entirely, because this is perfectly legal SQL: SELECT * FROM (SELECT (-999) AS x, (-999) AS x) b but once the "*" has been expanded there is actually no spec-compliant representation of this query. So there's always going to be some "don't do that" aspect here. There are some other fun aspects of trying to ensure that dumped views can be reloaded. In particular, the possibility that columns of underlying tables can be renamed introduces all sorts of hazards :-(. There's been some discussion of how we might make that safer, but a bulletproof solution seems to require introducing nonstandard syntax for things like JOIN USING. People haven't been too excited about that. regards, tom lane
Thanks, Tom, for your helpful explanation and suggestions. > Now you're doubtless wondering why Postgres doesn't dodge this ambiguity > for you. This is exactly what I was wondering, of course. And I follow the reasoning behind why it cannot, at present. If Postgrescan't ensure that the view definition is valid SQL, though, what about the (seemingly more manageable) idea of providingsome kind of notice when that definition is not re-loadable? Perhaps pg_dump could do this? Maybe that sounds like hand-holding, and I suspect the response will be "always test your backup before you need it!" Andcertainly I've learned my lesson about that. Believe me, I don't mean to "request" anything here, merely raise the pointthat for users that rely heavily on interdependent VIEWs, a cascade of errors flowing back from pg_restore can makefor a pretty frightening moment. Thanks, Bryan
Bryan Lee Nuse <nuse@uga.edu> writes: >> Now you're doubtless wondering why Postgres doesn't dodge this ambiguity >> for you. > This is exactly what I was wondering, of course. And I follow the reasoning behind why it cannot, at present. If Postgrescan't ensure that the view definition is valid SQL, though, what about the (seemingly more manageable) idea of providingsome kind of notice when that definition is not re-loadable? Perhaps pg_dump could do this? I spent considerable time thinking about this last week, and have a draft patch that fixes this issue and some related ones: http://archives.postgresql.org/pgsql-hackers/2012-12/msg01694.php So hopefully this hazard will be gone in 9.3. I doubt we'd risk back-patching the change though. regards, tom lane