Thread: DROP COLUMN Progress

DROP COLUMN Progress

From

"Christopher Kings-Lynne"

Date:

05 July 2002, 03:57:18

OK,

This is the problem I'm having with the DROP COLUMN implementation.  Since
I've already incorporated all of Hiroshi's changes, I think this may have
been an issue in his trial implementation as well.

I have attached my current patch, which works fine and compiles properly.

Ok, if you drop a column 'b', then all these work properly:

select * from tab;
select tab.* from tab;
select b from tab;
update tab set b = 3;
select * from tab where b = 3;
insert into tab (b) values (3);

That's all good.  However, the issue is that one of the things that happens
when you drop a column is that the column is renamed to 'dropped_%attnum%'.
So, say the 'b' column is renamed to 'dropped_2', then you can do this:

select dropped_2 from tab;
select tab.dropped_2 from tab;
update tab set dropped_2 = 3;
select * from tab where dropped_2 = 3;

Where have I missed the COLUMN_IS_DROPPED checks???

Another thing:  I don't want to name dropped columns 'dropped_...' as I
think that's unfair on our non-English speaking users.  Should I just use
'xxxx' or something?

Thanks for any help,

Chris

Attachment

dropcolumn.txt.gz

Re: DROP COLUMN Progress

From

Tom Lane

Date:

05 July 2002, 12:10:43

"Christopher Kings-Lynne" <chriskl@familyhealth.com.au> writes:
> So, say the 'b' column is renamed to 'dropped_2', then you can do this:

> select dropped_2 from tab;
> select tab.dropped_2 from tab;
> update tab set dropped_2 = 3;
> select * from tab where dropped_2 = 3;

> Where have I missed the COLUMN_IS_DROPPED checks???

Sounds like you aren't checking in the part of the parser that resolves
simple variable references.

> Another thing:  I don't want to name dropped columns 'dropped_...' as I
> think that's unfair on our non-English speaking users.  Should I just use
> 'xxxx' or something?

Don't be silly --- the system catalogs are completely English-centric
already.  Do you want to change all the catalog and column names to
meaningless strings?  Since the dropped columns should be invisible to
anyone who's not poking at the catalogs, I don't see that we are adding
any cognitive load ...
        regards, tom lane

Re: DROP COLUMN Progress

From

Bruce Momjian

Date:

05 July 2002, 15:48:27

Christopher Kings-Lynne wrote:
> OK,
> 
> This is the problem I'm having with the DROP COLUMN implementation.  Since
> I've already incorporated all of Hiroshi's changes, I think this may have
> been an issue in his trial implementation as well.
> 
> I have attached my current patch, which works fine and compiles properly.
> 
> Ok, if you drop a column 'b', then all these work properly:
> 
> select * from tab;
> select tab.* from tab;
> select b from tab;
> update tab set b = 3;
> select * from tab where b = 3;
> insert into tab (b) values (3);
> 
> That's all good.  However, the issue is that one of the things that happens
> when you drop a column is that the column is renamed to 'dropped_%attnum%'.
> So, say the 'b' column is renamed to 'dropped_2', then you can do this:
> 
> select dropped_2 from tab;
> select tab.dropped_2 from tab;
> update tab set dropped_2 = 3;
> select * from tab where dropped_2 = 3;
> 
> Where have I missed the COLUMN_IS_DROPPED checks???

OK, my guess is that it is checks in parser/.  I would issue each of
these queries with a non-existant column name, find the error message in
the code, and add an isdropped check in those places.

> Another thing:  I don't want to name dropped columns 'dropped_...' as I
> think that's unfair on our non-English speaking users.  Should I just use
> 'xxxx' or something?

I think "dropped" is OK.  The SQL is still English, e.g. DROP.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: DROP COLUMN Progress

From

"Christopher Kings-Lynne"

Date:

08 July 2002, 00:10:09

> OK, my guess is that it is checks in parser/.  I would issue each of
> these queries with a non-existant column name, find the error message in
> the code, and add an isdropped check in those places.

OK, I think I've narrowed it down to this block of code in scanRTEForColumn
in parse_relation.c:
       /*        * Scan the user column names (or aliases) for a match. Complain if        * multiple matches.
*/      foreach(c, rte->eref->colnames)       {               /* @@ SKIP DROPPED HERE? @@ */               attnum++;
          if (strcmp(strVal(lfirst(c)), colname) == 0)               {                       if (result)
              elog(ERROR, "Column reference \"%s\" is
 
ambiguous", colname);                       result = (Node *) make_var(pstate, rte, attnum);
rte->checkForRead= true;               }       }
 


I'm thinking that I should put a 'SearchSysCacheCopy' where my @@ comment is
to retrieve the attribute by name, and then do a check to see if it's
dropped.  Is that the best/fastest way of doing things?  Seems unfortunate
to add a another cache lookup in every parsed query...

Comments?

Chris

Re: DROP COLUMN Progress

From

Bruce Momjian

Date:

08 July 2002, 00:17:01

Christopher Kings-Lynne wrote:
>         /*
>          * Scan the user column names (or aliases) for a match. Complain if
>          * multiple matches.
>          */
>         foreach(c, rte->eref->colnames)
>         {
>                 /* @@ SKIP DROPPED HERE? @@ */
>                 attnum++;
>                 if (strcmp(strVal(lfirst(c)), colname) == 0)
>                 {
>                         if (result)
>                                 elog(ERROR, "Column reference \"%s\" is
> ambiguous", colname);
>                         result = (Node *) make_var(pstate, rte, attnum);
>                         rte->checkForRead = true;
>                 }
>         }
> 
> 
> I'm thinking that I should put a 'SearchSysCacheCopy' where my @@ comment is
> to retrieve the attribute by name, and then do a check to see if it's
> dropped.  Is that the best/fastest way of doing things?  Seems unfortunate
> to add a another cache lookup in every parsed query...

I am still looking but perhaps you could supress dropped columns from
getting into eref in the first place.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: DROP COLUMN Progress

From

"Christopher Kings-Lynne"

Date:

08 July 2002, 01:16:24

> > I'm thinking that I should put a 'SearchSysCacheCopy' where my
> @@ comment is
> > to retrieve the attribute by name, and then do a check to see if it's
> > dropped.  Is that the best/fastest way of doing things?  Seems
> unfortunate
> > to add a another cache lookup in every parsed query...
>
> I am still looking but perhaps you could supress dropped columns from
> getting into eref in the first place.

OK, that's done.  I'm working on not allowing dropped columns in UPDATE
targets now.

Chris

Re: DROP COLUMN Progress

From

"Christopher Kings-Lynne"

Date:

08 July 2002, 03:51:16

> > I am still looking but perhaps you could supress dropped columns from
> > getting into eref in the first place.
>
> OK, that's done.  I'm working on not allowing dropped columns in UPDATE
> targets now.

OK, I've fixed it so that dropped columns cannot be targetted in an update
statement, however now I'm running into this problem:

If you issue an INSERT statement without qualifying any field names, ie:

INSERT INTO tab VALUES (3);

Although it will automatically insert NULL for the dropped columns, it still
does cache lookups for the type of the dropped columns, etc.  I noticed that
when I tried setting the atttypid of the dropped column to (Oid)-1.  Where
is the bit of code that figures out the list of columns to insert into in an
unqualified INSERT statement?

I'm thinking that it would be nice if the dropped columns never even make it
into the list of target attributes, for performance reasons...

Chris

Re: DROP COLUMN Progress

From

Tom Lane

Date:

08 July 2002, 13:03:12

Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Christopher Kings-Lynne wrote:
>> I'm thinking that I should put a 'SearchSysCacheCopy' where my @@ comment is
>> to retrieve the attribute by name, and then do a check to see if it's
>> dropped.  Is that the best/fastest way of doing things?  Seems unfortunate
>> to add a another cache lookup in every parsed query...

> I am still looking but perhaps you could supress dropped columns from
> getting into eref in the first place.

That was my first thought also, but then the wrong attnum would be used
in the "make_var".  Ugh.  I think what Chris needs to do is extend the
eref data structure so that there can be placeholders for dropped
attributes.  Perhaps NULLs could be included in the list, and then the
code would become like
attnum++;if (lfirst(c) && strcmp(strVal(lfirst(c)), colname) == 0)    ...

This would require changes in quite a number of places :-( but I'm not
sure we have much choice.  The eref structure really needs to line up
with attnums.

Another possibility is to enter the dropped attnames in the eref list
normally, and do the drop test *after* hitting a match not before.
This is still slow, but not as horrendously O(N^2) slow as Chris's
original pseudocode.  I'm not sure how much work it'd really save though;
you might find yourself hitting all the same places to add tests.
        regards, tom lane

Re: DROP COLUMN Progress

From

Bruce Momjian

Date:

08 July 2002, 23:13:31

Christopher Kings-Lynne wrote:
> > > I am still looking but perhaps you could supress dropped columns from
> > > getting into eref in the first place.
> >
> > OK, that's done.  I'm working on not allowing dropped columns in UPDATE
> > targets now.
> 
> OK, I've fixed it so that dropped columns cannot be targetted in an update
> statement, however now I'm running into this problem:
> 
> If you issue an INSERT statement without qualifying any field names, ie:
> 
> INSERT INTO tab VALUES (3);
> 
> Although it will automatically insert NULL for the dropped columns, it still
> does cache lookups for the type of the dropped columns, etc.  I noticed that
> when I tried setting the atttypid of the dropped column to (Oid)-1.  Where
> is the bit of code that figures out the list of columns to insert into in an
> unqualified INSERT statement?

parse_target.c::checkInsertTargets()

>
> I'm thinking that it would be nice if the dropped columns never even make it
> into the list of target attributes, for performance reasons...

Yea, just sloppy to have them there.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: DROP COLUMN Progress

From

"Christopher Kings-Lynne"

Date:

09 July 2002, 03:42:24

> That was my first thought also, but then the wrong attnum would be used
> in the "make_var".  Ugh.  I think what Chris needs to do is extend the
> eref data structure so that there can be placeholders for dropped
> attributes.  Perhaps NULLs could be included in the list, and then the
> code would become like

Hmmm...  I don't get it - at the moment I'm preventing them from even
getting into the eref and all regression tests pass and every test I try
works as well...

Chris

Re: DROP COLUMN Progress

From

Tom Lane

Date:

09 July 2002, 03:45:13

"Christopher Kings-Lynne" <chriskl@familyhealth.com.au> writes:
>> Are you checking access to columns that're to the right of the one
>> dropped?

> OK, interesting:

> test=# create table test (a int4, b int4, c int4, d int4);
> CREATE TABLE
> test=# insert into test values (1,2,3,4);
> INSERT 16588 1
> test=# alter table test drop b;
> ALTER TABLE
> test=# select * from test;
>  a | d | d
> ---+---+---
>  1 | 3 | 4
> (1 row)

What of
SELECT a,c,d FROM test

I'll bet that doesn't work at all...
        regards, tom lane

Re: DROP COLUMN Progress

From

Tom Lane

Date:

09 July 2002, 05:45:26

"Christopher Kings-Lynne" <chriskl@familyhealth.com.au> writes:
>> What of
>> SELECT a,c,d FROM test
>> I'll bet that doesn't work at all...

> Yeah, broken.  Damn.

Yup.  That loop we were just looking at needs to derive the correct
attnum when it matches a column name.  If you remove deleted columns
from the eref list altogether, you get the wrong answer.
        regards, tom lane

Re: DROP COLUMN Progress

From

"Christopher Kings-Lynne"

Date:

09 July 2002, 05:51:41

> > test=# create table test (a int4, b int4, c int4, d int4);
> > CREATE TABLE
> > test=# insert into test values (1,2,3,4);
> > INSERT 16588 1
> > test=# alter table test drop b;
> > ALTER TABLE
> > test=# select * from test;
> >  a | d | d
> > ---+---+---
> >  1 | 3 | 4
> > (1 row)
> 
> What of
> 
>     SELECT a,c,d FROM test
> 
> I'll bet that doesn't work at all...

Yeah, broken.  Damn.

test=# SELECT a,c,d FROM test;a | c | d
---+---+---1 | 2 | 3
(1 row)

test=# SELECT a,d FROM test;a | d
---+---1 | 3
(1 row)

test=# SELECT d,c,a FROM test;d | c | a
---+---+---3 | 2 | 1
(1 row)

Chris

Re: DROP COLUMN Progress

From

"Christopher Kings-Lynne"

Date:

09 July 2002, 06:25:40

> "Christopher Kings-Lynne" <chriskl@familyhealth.com.au> writes:
> >> That was my first thought also, but then the wrong attnum would be used
> >> in the "make_var".  Ugh.  I think what Chris needs to do is extend the
> >> eref data structure so that there can be placeholders for dropped
> >> attributes.  Perhaps NULLs could be included in the list, and then the
> >> code would become like
>
> > Hmmm...  I don't get it - at the moment I'm preventing them from even
> > getting into the eref and all regression tests pass and every test I try
> > works as well...
>
> Are you checking access to columns that're to the right of the one
> dropped?

OK, interesting:

test=# create table test (a int4, b int4, c int4, d int4);
CREATE TABLE
test=# insert into test values (1,2,3,4);
INSERT 16588 1
test=# alter table test drop b;
ALTER TABLE
test=# select * from test;a | d | d
---+---+---1 | 3 | 4
(1 row)

It half works, half doesn't.  Sigh - how come these things always turn out
harder than I think!?

pg_attribute:

test=# select attrelid,attname,attisdropped from pg_attribute where
attrelid=16586 and attnum > 0;attrelid |  attname  | attisdropped
----------+-----------+--------------   16586 | a         | f   16586 | dropped_2 | t   16586 | c         | f   16586 |
d        | f
 
(4 rows)

Chris

Re: DROP COLUMN Progress

From

Tom Lane

Date:

09 July 2002, 06:26:33

"Christopher Kings-Lynne" <chriskl@familyhealth.com.au> writes:
>> That was my first thought also, but then the wrong attnum would be used
>> in the "make_var".  Ugh.  I think what Chris needs to do is extend the
>> eref data structure so that there can be placeholders for dropped
>> attributes.  Perhaps NULLs could be included in the list, and then the
>> code would become like

> Hmmm...  I don't get it - at the moment I'm preventing them from even
> getting into the eref and all regression tests pass and every test I try
> works as well...

Are you checking access to columns that're to the right of the one
dropped?
        regards, tom lane