Thread: UNION and pg_restore

UNION and pg_restore

From

Bryan Lee Nuse

Date:

21 December 2012, 18:49:42

Greetings, list,

I have a VIEW I use to combine both acquired and missing field observations. Because of they way these data will be used, missing values can't be NULL. So I make them some outrageous integer value, like -999. I put the full VIEW together using UNION.

As shown in the test case below, CREATE VIEW slightly changes the appearance of the query it stores as my VIEW definition. Now, "?column?" appears as a column reference in one of the SELECTs to be UNIONed. This did not present a problem until I tried to restore the database from a pg_dump custom-format file. Then I got the following kind of error (this one comes from trying to restore the test data given below):

pg_restore: [archiver (db)] Error from TOC entry 164; 1259 47618 VIEW view_1 test_user
pg_restore: [archiver (db)] could not execute query: ERROR: column reference "?column?" is ambiguous
LINE 2: ...2 FROM tab_1 UNION SELECT a.id_1, a.id_2, a.id_3, b."?column...
^
Command was: CREATE VIEW view_1 AS
SELECT tab_1.id_1, tab_1.id_2, tab_1.id_3, tab_1.data_1, tab_1.data_2 FROM tab_1 UNION SELECT a.id...

My question is, then, how is it that the query embodied in "view_1" below executes fine, but cannot seem to be restored? Is this telling me my query is dumb? If so, any advice on how to easily derive "view_1" from "tab_1" and "tab_2" below, without baffling pg_restore, would be welcome.

Thanks,

Bryan

===================================

The following test case mirrors my own:

CREATE TABLE tab_1 ( -- Acquired observations.

id_1 int,

id_2 int,

id_3 int,

data_1 int,

data_2 int

);

INSERT INTO tab_1

VALUES (1,2,3,4,5);

CREATE TABLE tab_2 ( -- Missing observations.

id_1 int,

id_2 int,

id_3 int

);

INSERT INTO tab_2

VALUES (6,7,8), (7,8,9);

CREATE OR REPLACE VIEW view_1 AS (

SELECT * FROM tab_1

UNION

SELECT * FROM

(SELECT * FROM tab_2) a

CROSS JOIN

(SELECT -999,-999) b

);

================================================

testing=> SELECT * FROM view_1 ORDER BY id_1;

id_1 | id_2 | id_3 | data_1 | data_2
------+------+------+--------+--------
1 | 2 | 3 | 4 | 5
6 | 7 | 8 | -999 | -999
7 | 8 | 9 | -999 | -999
(3 rows)

testing=> \d+ view_1

View "test.view_1"
Column | Type | Modifiers | Storage | Description
--------+---------+-----------+---------+-------------
id_1 | integer | | plain |
id_2 | integer | | plain |
id_3 | integer | | plain |
data_1 | integer | | plain |
data_2 | integer | | plain |
View definition:
   SELECT tab_1.id_1, tab_1.id_2, tab_1.id_3, tab_1.data_1, tab_1.data_2
   FROM tab_1
UNION
   SELECT a.id_1, a.id_2, a.id_3, b."?column?" AS data_1, b."?column?" AS data_2
   FROM ( SELECT tab_2.id_1, tab_2.id_2, tab_2.id_3
   FROM tab_2) a
   CROSS JOIN ( SELECT (-999), (-999)) b;

testing=> SELECT version();

version

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 9.1.6 on x86_64-apple-darwin11.4.0, compiled by i686-apple-darwin11-llvm-gcc-4.2 (GCC) 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.11.00), 64-bit
(1 row)

================================================

Re: UNION and pg_restore

From

Tom Lane

Date:

21 December 2012, 22:37:06

Bryan Lee Nuse <nuse@uga.edu> writes:
> My question is, then, how is it that the query embodied in "view_1" below executes fine, but cannot seem to be
restored? Is this telling me my query is dumb?  If so, any advice on how to easily derive "view_1" from "tab_1" and
"tab_2"below, without baffling pg_restore, would be welcome. 

The core issue is that you've got

    SELECT * FROM (SELECT (-999), (-999)) b;

If you execute this on its own, you'll get

 ?column? | ?column?
----------+----------
     -999 |     -999
(1 row)

that is, both columns have been assigned the arbitrary name "?column?".
That's okay in itself, because the "*" is expanded positionally and
doesn't care whether the columns have distinct names or not.  However,
when the view definition is dumped out, you have

> View definition:
>          SELECT tab_1.id_1, tab_1.id_2, tab_1.id_3, tab_1.data_1, tab_1.data_2
>            FROM tab_1
> UNION
>          SELECT a.id_1, a.id_2, a.id_3, b."?column?" AS data_1, b."?column?" AS data_2
>            FROM ( SELECT tab_2.id_1, tab_2.id_2, tab_2.id_3
>                    FROM tab_2) a
>      CROSS JOIN ( SELECT (-999), (-999)) b;

and now there are ambiguous references to b."?column?" in there.

The short answer to this is that you ought to take some care that the
columns of the sub-select have distinct names.  You could do it in
at least two ways:

    (SELECT (-999), (-999)) b(x,y)
    (SELECT (-999) AS x, (-999) AS y) b

Now you're doubtless wondering why Postgres doesn't dodge this ambiguity
for you.  One way would be to dump out the view using the "*" notation,
but it turns out that that is contrary to SQL standard: "*" must be
expanded at parse time.  (The place in the standard that mandates this
is a little note under ALTER TABLE ADD COLUMN that says that adding a
column doesn't affect the meaning of any previously defined views.
Hence "SELECT * FROM foo" has to be interpreted as expanding the "*"
immediately, lest its meaning change due to subsequent addition of a
column to foo.)

The other thing we might do is to generate distinct names for unnamed
columns.  That would probably be a good idea and maybe we'll do it
sooner or later, but it's not required by (recent versions of) the SQL
standard.  It would in any case not eliminate the risk entirely, because
this is perfectly legal SQL:

    SELECT * FROM (SELECT (-999) AS x, (-999) AS x) b

but once the "*" has been expanded there is actually no spec-compliant
representation of this query.  So there's always going to be some "don't
do that" aspect here.

There are some other fun aspects of trying to ensure that dumped views
can be reloaded.  In particular, the possibility that columns of
underlying tables can be renamed introduces all sorts of hazards :-(.
There's been some discussion of how we might make that safer, but a
bulletproof solution seems to require introducing nonstandard syntax
for things like JOIN USING.  People haven't been too excited about that.

            regards, tom lane

Re: UNION and pg_restore

From

Bryan Lee Nuse

Date:

28 December 2012, 16:51:57

Thanks, Tom, for your helpful explanation and suggestions.

> Now you're doubtless wondering why Postgres doesn't dodge this ambiguity
> for you.

This is exactly what I was wondering, of course.  And I follow the reasoning behind why it cannot, at present.  If
Postgrescan't ensure that the view definition is valid SQL, though, what about the (seemingly more manageable) idea of
providingsome kind of notice when that definition is not re-loadable?  Perhaps pg_dump could do this?   

Maybe that sounds like hand-holding, and I suspect the response will be "always test your backup before you need it!"
Andcertainly I've learned my lesson about that.  Believe me, I don't mean to "request" anything here, merely raise the
pointthat for users that rely heavily on interdependent VIEWs, a cascade of errors flowing back from pg_restore can
makefor a pretty frightening moment. 

Thanks,
Bryan

Re: UNION and pg_restore

From

Tom Lane

Date:

31 December 2012, 16:36:37

Bryan Lee Nuse <nuse@uga.edu> writes:
>> Now you're doubtless wondering why Postgres doesn't dodge this ambiguity
>> for you.

> This is exactly what I was wondering, of course.  And I follow the reasoning behind why it cannot, at present.  If
Postgrescan't ensure that the view definition is valid SQL, though, what about the (seemingly more manageable) idea of
providingsome kind of notice when that definition is not re-loadable?  Perhaps pg_dump could do this?   

I spent considerable time thinking about this last week, and have a
draft patch that fixes this issue and some related ones:
http://archives.postgresql.org/pgsql-hackers/2012-12/msg01694.php
So hopefully this hazard will be gone in 9.3.  I doubt we'd risk
back-patching the change though.

            regards, tom lane