Home > mailing lists

Thread: Fast AT ADD COLUMN with DEFAULTs

Fast AT ADD COLUMN with DEFAULTs

From

Serge Rielau

Date:

05 October 2016, 18:59:27

Dear Hackers,

I’m working on a patch that expands PG’s ability to add columns to a table without a table rewrite (i.e. at O(1) cost) from the nullable-without-default to a more general case.

E.g.

CREATE TABLE T(pk INT NOT NULL PRIMARY KEY);

INSERT INTO T VALEUS (1), (2), (3);

ALTER TABLE T ADD COLUMN c1 INTEGER NOT NULL DEFAULT 5;

INSERT INTO T VALUES (4, DEFAULT);

ALTER TABLE T ALTER COLUMN SET DEFAULT 6;

INSERT INTO T VALUS (5, DEFAULT);

SELECT * FROM T ORDER BY pk;

=>

(1, 5),

(2, 5),

(3, 5),

(4, 5),

(5, 6);

Rows 1-3 have never been updated, yet they know that their values of c1 is 5.

The requirement is driven by large tables for which add column takes too much time and/or produces too large a transaction for comfort.

In simplified terms:

* a second “exist default” is computed and stored in the catalogs at time of

AT ADD COLUMN

* The exist default is cached in the tuple descriptor (e.g in attrdef)

* When one of the getAttr or copytuple related routines is invoked the exist default is filled in instead of simply NULL padding if the tuple is shorter the requested attribute number.

Is there an interest in principle in the community for this functionality?

Cheers

Serge Rielau

Salesforce.com

via Newton Mail

Re: Fast AT ADD COLUMN with DEFAULTs

From

Vitaly Burovoy

Date:

05 October 2016, 21:45:45

On 10/5/16, Serge Rielau <serge@rielau.com> wrote:
> Dear Hackers,
>
> I’m working on a patch that expands PG’s ability to add columns to a table
> without a table rewrite (i.e. at O(1) cost) from the
> nullable-without-default to a more general case. E.g.
>
> CREATE TABLE T(pk INT NOT NULL PRIMARY KEY);
> INSERT INTO T VALEUS (1), (2), (3);
> ALTER TABLE T ADD COLUMN c1 INTEGER NOT NULL DEFAULT 5;
> INSERT INTO T VALUES (4, DEFAULT);
> ALTER TABLE T ALTER COLUMN SET DEFAULT 6;
> INSERT INTO T VALUS (5, DEFAULT);
> SELECT * FROM T ORDER BY pk;
> =>
> (1, 5),
> (2, 5),
> (3, 5),
> (4, 5),
> (5, 6);
>
> Rows 1-3 have never been updated, yet they know that their values of c1 is
> 5.
>
> The requirement is driven by large tables for which add column takes too
> much time and/or produces too large a transaction for comfort.
>
> In simplified terms:
>
> * a second “exist default” is computed and stored in
> the catalogs at time of AT ADD COLUMN
>
> * The exist default is cached in the tuple descriptor (e.g in attrdef)
>
> * When one of the getAttr or copytuple related routines is invoked
> the exist default is filled in instead of simply NULL padding if the
> tuple is shorter the requested attribute number.
>
> Is there an interest in principle in the community for this functionality?

Wow! I think it would be great! It also solves huge vacuuming after
rewriting the table(s).
Just pay attention to corner cases like indexes, statistics and speed.

But I'd like to see solution for more important cases like:
CREATE TABLE t (pk INT NOT NULL PRIMARY KEY);
INSERT INTO t VALUES (1), (2), (3);
ALTER TABLE t ADD COLUMN c1 timestamptz NOT NULL DEFAULT 'now';
SELECT * FROM t ORDER BY pk;
ALTER TABLE t ADD COLUMN c2 serial;
SELECT * FROM t ORDER BY pk;
INSERT INTO t(pk) VALUES (4);
SELECT * FROM t ORDER BY pk;

P.S.: I really think it is a good idea, just some research is
necessary and covering corner cases...

--
Best regards,
Vitaly Burovoy

Re: Fast AT ADD COLUMN with DEFAULTs

From

Serge Rielau

Date:

05 October 2016, 21:57:23

<img class="cloudmagic-smart-beacon" height="0"
src="https://tr.cloudmagic.com/h/v6/emailtag/tag/1475704591/647b813270dffd4c4b2631f3c9eb3f46/9cf3801d03770ada01bb39dc8f52321d/eb0a7f22160b433d9b4d35634754e347/9efab2399c7c560b34de477b9aa0a465/ufo.gif"
style="border:0;width:0; height:0; overflow:hidden;" width="0" /><div dir="auto"><div
id="cm_replymail_content_wrap"><divclass="">On Wed, Oct 5, 2016 at 2:45 PM, Vitaly Burovoy
<vitaly.burovoy@gmail.com>wrote:<br /><div id="cm_replymail_content_1475704139" style="overflow:
visible;"><blockquotestyle="margin:0;border-left: #D6D6D6 1px solid;padding-left: 10px;">On 10/5/16, Serge Rielau
<serge@rielau.com>wrote:<br />> Dear Hackers,<br />><br />> I’m working on a patch that expands PG’s
abilityto add columns to a table<br />> without a table rewrite (i.e. at O(1) cost) from the<br />>
nullable-without-defaultto a more general case. E.g.<br />...<br />> Is there an interest in principle in the
communityfor this functionality?<br /><br />Wow! I think it would be great! It also solves huge vacuuming after<br
/>rewritingthe table(s).<br />Just pay attention to corner cases like indexes, statistics and speed.</blockquote><div
id="ID_1475704192987">Yes, Yes, and still analyzing speed</div><blockquote class="" style="margin: 0px;
border-left-color:rgb(214, 214, 214); border-left-width: 1px; border-left-style: solid; padding-left: 10px;"><br />But
I'dlike to see solution for more important cases like:<br />CREATE TABLE t (pk INT NOT NULL PRIMARY KEY);<br />INSERT
INTOt VALUES (1), (2), (3);<br />ALTER TABLE t ADD COLUMN c1 timestamptz NOT NULL DEFAULT 'now';<br />SELECT * FROM t
ORDERBY pk;<br />ALTER TABLE t ADD COLUMN c2 serial;<br />SELECT * FROM t ORDER BY pk;<br />INSERT INTO t(pk) VALUES
(4);<br/>SELECT * FROM t ORDER BY pk;</blockquote><div id="ID_1475704277131">By solution I think you mean a semantic
changefrom what it is doing today which is:</div><div id="ID_1475704277131">* “Now” is fixed to ALTER TABLE time for
allpre-existing rows</div><div id="ID_1475704277131">* serial will fill in the same value for all pre-existing
rows</div><divid="ID_1475704277131">Having different semantics for those would require a rewrite and probably different
syntaxin some form.</div><div id="ID_1475704277131"><br /></div><div id="ID_1475704277131">This is what my patch does
onour PG derivative today: </div><blockquote class="" style="margin: 0px; border-left-color: rgb(214, 214, 214);
border-left-width:1px; border-left-style: solid; padding-left: 10px;"></blockquote><div id="ID_1475704251970">CREATE
TABLEt (pk INT NOT NULL PRIMARY KEY);</div><div id="ID_1475704251970">CREATE TABLE</div><div
id="ID_1475704251970">postgres=#INSERT INTO t VALUES (1), (2), (3);</div><div id="ID_1475704251970">INSERT 0
3</div><divid="ID_1475704251970">postgres=# ALTER TABLE t ADD COLUMN c1 timestamptz NOT NULL DEFAULT 'now';</div><div
id="ID_1475704251970">ALTERTABLE</div><div id="ID_1475704251970">postgres=# SELECT * FROM t ORDER BY pk;</div><div
id="ID_1475704251970"> pk|              c1               </div><div
id="ID_1475704251970">----+-------------------------------</div><divid="ID_1475704251970">  1 | 2016-10-05
21:47:58.919194+00</div><divid="ID_1475704251970">  2 | 2016-10-05 21:47:58.919194+00</div><div id="ID_1475704251970"> 
3| 2016-10-05 21:47:58.919194+00</div><div id="ID_1475704251970">(3 rows)</div><div id="ID_1475704251970"><br
/></div><divid="ID_1475704251970">postgres=# </div><div id="ID_1475704251970">postgres=# ALTER TABLE t ADD COLUMN c2
serial;</div><divid="ID_1475704251970">SELECT * FROM t ORDER BY pk;</div><div id="ID_1475704251970">INSERT INTO t(pk)
VALUES(4);</div><div id="ID_1475704251970">SELECT * FROM t ORDER BY pk;</div><div id="ID_1475704251970"><br
/></div><divid="ID_1475704251970">ALTER TABLE t ADD COLUMN c2 serial;</div><div id="ID_1475704251970">ALTER
TABLE</div><divid="ID_1475704251970">postgres=# SELECT * FROM t ORDER BY pk;</div><div id="ID_1475704251970"> pk |    
        c1               | c2 </div><div id="ID_1475704251970">----+-------------------------------+----</div><div
id="ID_1475704251970"> 1 | 2016-10-05 21:47:58.919194+00 |  1</div><div id="ID_1475704251970">  2 | 2016-10-05
21:47:58.919194+00|  1</div><div id="ID_1475704251970">  3 | 2016-10-05 21:47:58.919194+00 |  1</div><div
id="ID_1475704251970">(3rows)</div><div id="ID_1475704251970"><br /></div><div id="ID_1475704251970">postgres=# INSERT
INTOt(pk) VALUES (4);</div><div id="ID_1475704251970">INSERT 0 1</div><div id="ID_1475704251970">postgres=# SELECT *
FROMt ORDER BY pk;</div><div id="ID_1475704251970"> pk |              c1               | c2 </div><div
id="ID_1475704251970">----+-------------------------------+----</div><divid="ID_1475704251970">  1 | 2016-10-05
21:47:58.919194+00|  1</div><div id="ID_1475704251970">  2 | 2016-10-05 21:47:58.919194+00 |  1</div><div
id="ID_1475704251970"> 3 | 2016-10-05 21:47:58.919194+00 |  1</div><div id="ID_1475704251970">  4 | 2016-10-05
21:47:58.919194+00|  2</div><div id="ID_1475704251970">(4 rows)</div><div id="ID_1475704251970"> </div><blockquote
class=""style="margin: 0px; border-left-color: rgb(214, 214, 214); border-left-width: 1px; border-left-style: solid;
padding-left:10px;">P.S.: I really think it is a good idea, just some research is<br />necessary and covering corner
cases...</blockquote><divid="ID_1475704460956">Thanks.</div><div id="ID_1475704460956">This would be my first
contribution. </div><divid="ID_1475704460956">I take it I would post a patch based on a recent PG 9.6 master for
review?</div><divid="ID_1475704460956">Or should I compose some sort of a design document? </div><div
id="ID_1475704460956"><br/></div><div id="ID_1475704460956">Cheers</div><div id="ID_1475704460956">Serge
Rielau</div><divid="ID_1475704460956">Salesforce.com</div></div></div></div></div>

Re: Fast AT ADD COLUMN with DEFAULTs

From

Andres Freund

Date:

05 October 2016, 22:15:49

On 2016-10-05 11:58:33 -0700, Serge Rielau wrote:
> Dear Hackers,
> I’m working on a patch that expands PG’s ability to add columns to a table without a table rewrite (i.e. at O(1)
cost)from the nullable-without-default to a more general case. E.g. CREATE TABLE T(pk INT NOT NULL PRIMARY KEY); INSERT
INTOT VALEUS (1), (2), (3); ALTER TABLE T ADD COLUMN c1 INTEGER NOT NULL DEFAULT 5; INSERT INTO T VALUES (4, DEFAULT);
ALTERTABLE T ALTER COLUMN SET DEFAULT 6; INSERT INTO T VALUS (5, DEFAULT); SELECT * FROM T ORDER BY pk; => (1, 5), (2,
5),(3, 5), (4, 5), (5, 6);
 
> Rows 1-3 have never been updated, yet they know that their values of c1 is 5.
> The requirement is driven by large tables for which add column takes too much time and/or produces too large a
transactionfor comfort.
 
> In simplified terms: * a second “exist default” is computed and stored in the catalogs at time of AT ADD COLUMN * The
existdefault is cached in the tuple descriptor (e.g in attrdef) * When one of the getAttr or copytuple related routines
isinvoked the exist default is filled in instead of simply NULL padding if the tuple is shorter the requested attribute
number.

If I understand this proposal correctly, altering a column default will
still have trigger a rewrite unless there's previous default?

Re: Fast AT ADD COLUMN with DEFAULTs

From

Vitaly Burovoy

Date:

05 October 2016, 22:23:13

On 10/5/16, Andres Freund <andres@anarazel.de> wrote:
> On 2016-10-05 11:58:33 -0700, Serge Rielau wrote:
>> Dear Hackers,
>> I’m working on a patch that expands PG’s ability to add columns to a table
>> without a table rewrite (i.e. at O(1) cost) from the
>> nullable-without-default to a more general case.
>
> If I understand this proposal correctly, altering a column default will
> still have trigger a rewrite unless there's previous default?

No, "a second “exist default"" was mentioned, i.e. it is an additional
column in a system table (pg_attribute) as default column values of
the "pre-alter" era. It solves changing of the default expression of
the same column later.

--
Best regards,
Vitaly Burovoy

Re: Fast AT ADD COLUMN with DEFAULTs

From

Andres Freund

Date:

05 October 2016, 22:29:53

On 2016-10-05 15:23:05 -0700, Vitaly Burovoy wrote:
> On 10/5/16, Andres Freund <andres@anarazel.de> wrote:
> > On 2016-10-05 11:58:33 -0700, Serge Rielau wrote:
> >> Dear Hackers,
> >> I’m working on a patch that expands PG’s ability to add columns to a table
> >> without a table rewrite (i.e. at O(1) cost) from the
> >> nullable-without-default to a more general case.
> >
> > If I understand this proposal correctly, altering a column default will
> > still have trigger a rewrite unless there's previous default?
>
> No, "a second “exist default"" was mentioned, i.e. it is an additional
> column in a system table (pg_attribute) as default column values of
> the "pre-alter" era. It solves changing of the default expression of
> the same column later.

Don't think that actually solves the issue. The default might be unset
for a while, for example. Essentially you'd need to be able to associate
arbitrary number of default values with an arbitrary set of rows.


ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
INSERT id = 1;
ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 1;
ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
INSERT id = 2;
ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 2;
ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
INSERT id = 3;
ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 3;

The result here would be that there's three rows with a default value
for foo that's the same as their id. None of them has that column
present in the row.

Re: Fast AT ADD COLUMN with DEFAULTs

From

Serge Rielau

Date:

05 October 2016, 22:34:45

<img class="cloudmagic-smart-beacon" height="0"
src="https://tr.cloudmagic.com/h/v6/emailtag/tag/1475706830/647b813270dffd4c4b2631f3c9eb3f46/9cf3801d03770ada01bb39dc8f52321d/c719df4bda2459f77b9ef0f2a251d073/9efab2399c7c560b34de477b9aa0a465/ufo.gif"
style="border:0;width:0; height:0; overflow:hidden;" width="0" /><div dir="auto"><div
id="cm_replymail_content_wrap"><divclass="">On Wed, Oct 5, 2016 at 3:23 PM, Vitaly Burovoy
<vitaly.burovoy@gmail.com>wrote:<br /><div id="cm_replymail_content_1475706734" style="overflow:
visible;"><blockquotestyle="margin:0;border-left: #D6D6D6 1px solid;padding-left: 10px;">On 10/5/16, Andres Freund
<andres@anarazel.de>wrote:<br />> On 2016-10-05 11:58:33 -0700, Serge Rielau wrote:<br />>> Dear
Hackers,<br/>>> I’m working on a patch that expands PG’s ability to add columns to a table<br />>> without
atable rewrite (i.e. at O(1) cost) from the<br />>> nullable-without-default to a more general case.<br />> If
Iunderstand this proposal correctly, altering a column default will<br />> still have trigger a rewrite unless
there'sprevious default?<br />No, "a second “exist default"" was mentioned, i.e. it is an additional<br />column in a
systemtable (pg_attribute) as default column values of<br />the "pre-alter" era. It solves changing of the default
expressionof<br />the same column later.</blockquote><div id="ID_1475706751433">Correct and good guess on
pg_attribute. </div><divid="ID_1475706751433">That’s where it’s living in my proposal. </div><div
id="ID_1475706751433"><br/></div><div id="ID_1475706751433">Cheers</div><div
id="ID_1475706751433">Serge</div></div></div></div></div>

Re: Fast AT ADD COLUMN with DEFAULTs

From

Vitaly Burovoy

Date:

05 October 2016, 22:37:02

On 10/5/16, Andres Freund <andres@anarazel.de> wrote:
> On 2016-10-05 15:23:05 -0700, Vitaly Burovoy wrote:
>> On 10/5/16, Andres Freund <andres@anarazel.de> wrote:
>> > On 2016-10-05 11:58:33 -0700, Serge Rielau wrote:
>> >> Dear Hackers,
>> >> I’m working on a patch that expands PG’s ability to add columns to a
>> >> table
>> >> without a table rewrite (i.e. at O(1) cost) from the
>> >> nullable-without-default to a more general case.
>> >
>> > If I understand this proposal correctly, altering a column default will
>> > still have trigger a rewrite unless there's previous default?
>>
>> No, "a second “exist default"" was mentioned, i.e. it is an additional
>> column in a system table (pg_attribute) as default column values of
>> the "pre-alter" era. It solves changing of the default expression of
>> the same column later.
>
> Don't think that actually solves the issue. The default might be unset
> for a while, for example. Essentially you'd need to be able to associate
> arbitrary number of default values with an arbitrary set of rows.
>
> ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
> INSERT id = 1;
> ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 1;
> ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
> INSERT id = 2;
> ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 2;
> ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
> INSERT id = 3;
> ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 3;
>
> The result here would be that there's three rows with a default value
> for foo that's the same as their id. None of them has that column
> present in the row.

I'm sorry, while I was writting "pre-alter" I meant
"pre-alter-add-column" era (not "pre-alter-set-default"), all later
default changes "current" default, whereas "pre-alter-add-column" adds
value if current column number < TupleDesc.natts.

All your DDL are in the "post-alter-add-column" era.

--
Best regards,
Vitaly Burovoy

Re: Fast AT ADD COLUMN with DEFAULTs

From

Vitaly Burovoy

Date:

05 October 2016, 22:39:52

On 10/5/16, Vitaly Burovoy <vitaly.burovoy@gmail.com> wrote:
> On 10/5/16, Andres Freund <andres@anarazel.de> wrote:
>> On 2016-10-05 15:23:05 -0700, Vitaly Burovoy wrote:
>>> On 10/5/16, Andres Freund <andres@anarazel.de> wrote:
>>> > On 2016-10-05 11:58:33 -0700, Serge Rielau wrote:
>>> >> Dear Hackers,
>>> >> I’m working on a patch that expands PG’s ability to add columns to a
>>> >> table
>>> >> without a table rewrite (i.e. at O(1) cost) from the
>>> >> nullable-without-default to a more general case.
>>> >
>>> > If I understand this proposal correctly, altering a column default
>>> > will
>>> > still have trigger a rewrite unless there's previous default?
>>>
>>> No, "a second “exist default"" was mentioned, i.e. it is an additional
>>> column in a system table (pg_attribute) as default column values of
>>> the "pre-alter" era. It solves changing of the default expression of
>>> the same column later.
>>
>> Don't think that actually solves the issue. The default might be unset
>> for a while, for example. Essentially you'd need to be able to associate
>> arbitrary number of default values with an arbitrary set of rows.
>>
>> ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
>> INSERT id = 1;
>> ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 1;
>> ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
>> INSERT id = 2;
>> ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 2;
>> ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
>> INSERT id = 3;
>> ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 3;
>>
>> The result here would be that there's three rows with a default value
>> for foo that's the same as their id. None of them has that column
>> present in the row.
>
> I'm sorry, while I was writting "pre-alter" I meant
> "pre-alter-add-column" era (not "pre-alter-set-default"), all later
> default changes "current" default, whereas "pre-alter-add-column" adds
> value if current column number < TupleDesc.natts.
>
> All your DDL are in the "post-alter-add-column" era.

I'm so sorry, I was in a hurry. Of course,
- if current column number < TupleDesc.natts.
+ if current column number > TupleDesc.natts.

--
Best regards,
Vitaly Burovoy

Re: Fast AT ADD COLUMN with DEFAULTs

From

Jeff Janes

Date:

05 October 2016, 22:45:01

On Wed, Oct 5, 2016 at 3:29 PM, Andres Freund <andres@anarazel.de> wrote:

On 2016-10-05 15:23:05 -0700, Vitaly Burovoy wrote:
> On 10/5/16, Andres Freund <andres@anarazel.de> wrote:
> > On 2016-10-05 11:58:33 -0700, Serge Rielau wrote:
> >> Dear Hackers,
> >> I’m working on a patch that expands PG’s ability to add columns to a table
> >> without a table rewrite (i.e. at O(1) cost) from the
> >> nullable-without-default to a more general case.
> >
> > If I understand this proposal correctly, altering a column default will
> > still have trigger a rewrite unless there's previous default?
>
> No, "a second “exist default"" was mentioned, i.e. it is an additional
> column in a system table (pg_attribute) as default column values of
> the "pre-alter" era. It solves changing of the default expression of
> the same column later.

Don't think that actually solves the issue. The default might be unset
for a while, for example. Essentially you'd need to be able to associate
arbitrary number of default values with an arbitrary set of rows.

ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
INSERT id = 1;
ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 1;
ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
INSERT id = 2;
ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 2;
ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
INSERT id = 3;
ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 3;

The result here would be that there's three rows with a default value
for foo that's the same as their id. None of them has that column
present in the row.

My understanding is that all of those would be materialized. The only default that isn't materialized is the one in effect in the same statement in which that column was added. Since a column can only be added once, the default in effect at the time the column was added can never change, no matter what you do to the default later on.

Cheers,

Jeff

Re: Fast AT ADD COLUMN with DEFAULTs

From

Andres Freund

Date:

05 October 2016, 22:48:32

On 2016-10-05 15:44:56 -0700, Jeff Janes wrote:
> On Wed, Oct 5, 2016 at 3:29 PM, Andres Freund <andres@anarazel.de> wrote:
> 
> > On 2016-10-05 15:23:05 -0700, Vitaly Burovoy wrote:
> > > On 10/5/16, Andres Freund <andres@anarazel.de> wrote:
> > > > On 2016-10-05 11:58:33 -0700, Serge Rielau wrote:
> > > >> Dear Hackers,
> > > >> I’m working on a patch that expands PG’s ability to add columns to a
> > table
> > > >> without a table rewrite (i.e. at O(1) cost) from the
> > > >> nullable-without-default to a more general case.
> > > >
> > > > If I understand this proposal correctly, altering a column default will
> > > > still have trigger a rewrite unless there's previous default?
> > >
> > > No, "a second “exist default"" was mentioned, i.e. it is an additional
> > > column in a system table (pg_attribute) as default column values of
> > > the "pre-alter" era. It solves changing of the default expression of
> > > the same column later.
> >
> > Don't think that actually solves the issue. The default might be unset
> > for a while, for example. Essentially you'd need to be able to associate
> > arbitrary number of default values with an arbitrary set of rows.
> >
> >
> > ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
> > INSERT id = 1;
> > ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 1;
> > ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
> > INSERT id = 2;
> > ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 2;
> > ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
> > INSERT id = 3;
> > ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 3;
> >
> > The result here would be that there's three rows with a default value
> > for foo that's the same as their id. None of them has that column
> > present in the row.
> >
> 
> My understanding is that all of those would be materialized.

But that'd require a table rewrite, as none of the above INSERTs were
done when a default was in place. But each has a different "applicable"
default value.


> The only
> default that isn't materialized is the one in effect in the same statement
> in which that column was added.  Since a column can only be added once, the
> default in effect at the time the column was added can never change, no
> matter what you do to the default later on.

DROP DEFAULT pretty much does that, because it allows multiple (set of)
rows with no value (or a NULL) for a specific column, but with differing
applicable default values.

Andres

Re: Fast AT ADD COLUMN with DEFAULTs

From

Tom Lane

Date:

05 October 2016, 22:58:54

Andres Freund <andres@anarazel.de> writes:
> On 2016-10-05 15:44:56 -0700, Jeff Janes wrote:
>>> No, "a second “exist default"" was mentioned, i.e. it is an additional
>>> column in a system table (pg_attribute) as default column values of
>>> the "pre-alter" era. It solves changing of the default expression of
>>> the same column later.

> Don't think that actually solves the issue. The default might be unset
> for a while, for example. Essentially you'd need to be able to associate
> arbitrary number of default values with an arbitrary set of rows.

I think it does work, as long as the "exists default" is immutable.
(For safety, personally, I'd restrict it to be a verbatim constant.)
The point is that you apply that when you are reading a row that has
so few columns that it must predate the original ALTER TABLE ADD COLUMN.
Subsequent ADD/DROP/SET DEFAULT affect the "active default" and hence
insertions that happen after them, but they don't affect the
interpretation of old rows.  And of course all rows inserted after the
ADD COLUMN contain explicit values of the column, so their meaning is
unaffected in any case.

You do need two defaults associated with a column to make this work.
The "exists default" never changes after the column is added.  But
in principle, the "exists default" just replaces the NULL value that
we implicitly insert now in such cases.

Need a better name for the concept, since evidently this name isn't
conveying the idea.
        regards, tom lane

Re: Fast AT ADD COLUMN with DEFAULTs

From

Pantelis Theodosiou

Date:

05 October 2016, 22:59:44

On Wed, Oct 5, 2016 at 11:44 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Wed, Oct 5, 2016 at 3:29 PM, Andres Freund <andres@anarazel.de> wrote:
On 2016-10-05 15:23:05 -0700, Vitaly Burovoy wrote:
> On 10/5/16, Andres Freund <andres@anarazel.de> wrote:
> > On 2016-10-05 11:58:33 -0700, Serge Rielau wrote:
> >> Dear Hackers,
> >> I’m working on a patch that expands PG’s ability to add columns to a table
> >> without a table rewrite (i.e. at O(1) cost) from the
> >> nullable-without-default to a more general case.
> >
> > If I understand this proposal correctly, altering a column default will
> > still have trigger a rewrite unless there's previous default?
>
> No, "a second “exist default"" was mentioned, i.e. it is an additional
> column in a system table (pg_attribute) as default column values of
> the "pre-alter" era. It solves changing of the default expression of
> the same column later.

Don't think that actually solves the issue. The default might be unset
for a while, for example. Essentially you'd need to be able to associate
arbitrary number of default values with an arbitrary set of rows.

ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
INSERT id = 1;
ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 1;
ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
INSERT id = 2;
ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 2;
ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
INSERT id = 3;
ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 3;

The result here would be that there's three rows with a default value
for foo that's the same as their id. None of them has that column
present in the row.

My understanding is that all of those would be materialized. The only default that isn't materialized is the one in effect in the same statement in which that column was added. Since a column can only be added once, the default in effect at the time the column was added can never change, no matter what you do to the default later on.

Cheers,

Jeff

I understood the same thing. That when the column is added the "DEFAULT constant" means 2 things:

-a- existing rows get a value of that constant (that is not actually written in the rows, but kept (hidden from the user) in the system tables and only written in the rows that are updated, vacuumed, etc) and

-b- new rows, inserted after the ADD COLUMN will get the DEFAULT constant, same way as a normal column definition would.

The b part can easily be changed later with an ALTER COLUMN that sets a new DEFAULT.

The a part is never changed - but possibly deleted from the system table when all rows existing before the ADD COLUMN have been updated.

Pantelis

Re: Fast AT ADD COLUMN with DEFAULTs

From

Vitaly Burovoy

Date:

05 October 2016, 23:00:52

On 10/5/16, Andres Freund <andres@anarazel.de> wrote:
> On 2016-10-05 15:44:56 -0700, Jeff Janes wrote:
>> On Wed, Oct 5, 2016 at 3:29 PM, Andres Freund <andres@anarazel.de> wrote:
>> > ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
>> > INSERT id = 1;
>> > ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 1;
>> > ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
>> > INSERT id = 2;
>> > ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 2;
>> > ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
>> > INSERT id = 3;
>> > ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 3;
>> >
>> > The result here would be that there's three rows with a default value
>> > for foo that's the same as their id. None of them has that column
>> > present in the row.
>> >
>>
>> My understanding is that all of those would be materialized.
>
> But that'd require a table rewrite, as none of the above INSERTs were
> done when a default was in place.

Since they did not have the default value, that tuples are written
with actual TupleDesc.natts where att_isnull for "withdefault" column
is set (actually the column does not have default for inserted tuples
in your case).

> But each has a different "applicable" default value.

No, their values are constructed "from scratch", not fetched from a
heap, so "pre-alter-add-column" default is not applicable for them.

>> The only
>> default that isn't materialized is the one in effect in the same
>> statement
>> in which that column was added.  Since a column can only be added once,
>> the
>> default in effect at the time the column was added can never change, no
>> matter what you do to the default later on.
>
> DROP DEFAULT pretty much does that, because it allows multiple (set of)
> rows with no value (or a NULL) for a specific column, but with differing
> applicable default values.

DROP DEFAULT is for "post-alter-add-column" tuples, it does not
affects "pre-alter-add-column" ones.

-- 
Best regards,
Vitaly Burovoy

Re: Fast AT ADD COLUMN with DEFAULTs

From

Andres Freund

Date:

05 October 2016, 23:01:28

On 2016-10-05 18:58:47 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2016-10-05 15:44:56 -0700, Jeff Janes wrote:
> >>> No, "a second “exist default"" was mentioned, i.e. it is an additional
> >>> column in a system table (pg_attribute) as default column values of
> >>> the "pre-alter" era. It solves changing of the default expression of
> >>> the same column later.
> 
> > Don't think that actually solves the issue. The default might be unset
> > for a while, for example. Essentially you'd need to be able to associate
> > arbitrary number of default values with an arbitrary set of rows.
> 
> I think it does work, as long as the "exists default" is immutable.
> (For safety, personally, I'd restrict it to be a verbatim constant.)
> The point is that you apply that when you are reading a row that has
> so few columns that it must predate the original ALTER TABLE ADD COLUMN.
> Subsequent ADD/DROP/SET DEFAULT affect the "active default" and hence
> insertions that happen after them, but they don't affect the
> interpretation of old rows.  And of course all rows inserted after the
> ADD COLUMN contain explicit values of the column, so their meaning is
> unaffected in any case.

Err, yes. I forgot that altering the default of an existing column
doesn't set the default for existing values. Sorry for the noise.

Greetings,

Andres Freund

Re: Fast AT ADD COLUMN with DEFAULTs

From

Tom Lane

Date:

05 October 2016, 23:05:18

I wrote:
> Need a better name for the concept, since evidently this name isn't
> conveying the idea.

Maybe "creation default" would work better?  Point being it's the
default value at the time of column creation.
        regards, tom lane

Re: Fast AT ADD COLUMN with DEFAULTs

From

Serge Rielau

Date:

05 October 2016, 23:06:18

<img class="cloudmagic-smart-beacon" height="0"
src="https://tr.cloudmagic.com/h/v6/emailtag/tag/1475708722/647b813270dffd4c4b2631f3c9eb3f46/9cf3801d03770ada01bb39dc8f52321d/c143e7051e30d22af27905a24303de11/9efab2399c7c560b34de477b9aa0a465/ufo.gif"
style="border:0;width:0; height:0; overflow:hidden;" width="0" /><div dir="auto"><br /><div class="cm_footer"
id="cm_footer"><divid="cm_sent_from">via <a
href="https://cloudmagic.com/k/d/mailapp?ct=dx&cv=9.0.74&pv=10.11.6&source=email_footer_2">Newton
Mail</a></div></div><span><br/></span><div id="cm_replymail_content_wrap"><div class="">On Wed, Oct 5, 2016 at 3:58 PM,
TomLane <tgl@sss.pgh.pa.us> wrote:<br /><div id="cm_replymail_content_1475708366" style="overflow:
visible;"><blockquotestyle="margin:0;border-left: #D6D6D6 1px solid;padding-left: 10px;">Andres Freund
<andres@anarazel.de>writes:<br />> On 2016-10-05 15:44:56 -0700, Jeff Janes wrote:<br />>>> No, "a
second“exist default"" was mentioned, i.e. it is an additional<br />>>> column in a system table
(pg_attribute)as default column values of<br />>>> the "pre-alter" era. It solves changing of the default
expressionof<br />>>> the same column later.<br /><br />> Don't think that actually solves the issue. The
defaultmight be unset<br />> for a while, for example. Essentially you'd need to be able to associate<br />>
arbitrarynumber of default values with an arbitrary set of rows.<br /><br />I think it does work, as long as the
"existsdefault" is immutable.<br />(For safety, personally, I'd restrict it to be a verbatim constant.)<br />The point
isthat you apply that when you are reading a row that has<br />so few columns that it must predate the original ALTER
TABLEADD COLUMN.<br />Subsequent ADD/DROP/SET DEFAULT affect the "active default" and hence<br />insertions that happen
afterthem, but they don't affect the<br />interpretation of old rows. And of course all rows inserted after the<br
/>ADDCOLUMN contain explicit values of the column, so their meaning is<br />unaffected in any
case.</blockquote><blockquoteclass="" style="margin: 0px; border-left-color: rgb(214, 214, 214); border-left-width:
1px;border-left-style: solid; padding-left: 10px;"><br />You do need two defaults associated with a column to make this
work.<br/>The "exists default" never changes after the column is added. But<br />in principle, the "exists default"
justreplaces the NULL value that<br />we implicitly insert now in such cases.</blockquote><div
id="ID_1475708435309">Explainedso much better than I could do it :-)</div><div id="ID_1475708435309"><br /></div><div
id="ID_1475708435309">Iwant to point out as a minor “extension” that there is no need for the default to be immutable.
Itis merely required that the default is evaluate at time of ADD COLUMN</div><div id="ID_1475708435309">and then we
rememberthe actual value for the exist default, rather than the parsed expression as we do for the “current”
default.</div><divid="ID_1475708435309"> </div><blockquote class="" style="margin: 0px; border-left-color: rgb(214,
214,214); border-left-width: 1px; border-left-style: solid; padding-left: 10px;">Need a better name for the concept,
sinceevidently this name isn't<br />conveying the idea.</blockquote><div id="ID_1475708551097">By all means. Got
anythingin mind?</div><div id="ID_1475708551097"><br /></div><div id="ID_1475708551097">Cheers</div><div
id="ID_1475708551097">SergeRielau</div><div id="ID_1475708551097"><br /></div></div></div></div></div>

Re: Fast AT ADD COLUMN with DEFAULTs

From

Pantelis Theodosiou

Date:

05 October 2016, 23:17:36

On Thu, Oct 6, 2016 at 12:05 AM, Serge Rielau <serge@rielau.com> wrote:

via Newton Mail

On Wed, Oct 5, 2016 at 3:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andres Freund <andres@anarazel.de> writes:
> On 2016-10-05 15:44:56 -0700, Jeff Janes wrote:
>>> No, "a second “exist default"" was mentioned, i.e. it is an additional
>>> column in a system table (pg_attribute) as default column values of
>>> the "pre-alter" era. It solves changing of the default expression of
>>> the same column later.

> Don't think that actually solves the issue. The default might be unset
> for a while, for example. Essentially you'd need to be able to associate
> arbitrary number of default values with an arbitrary set of rows.

I think it does work, as long as the "exists default" is immutable.
(For safety, personally, I'd restrict it to be a verbatim constant.)
The point is that you apply that when you are reading a row that has
so few columns that it must predate the original ALTER TABLE ADD COLUMN.
Subsequent ADD/DROP/SET DEFAULT affect the "active default" and hence
insertions that happen after them, but they don't affect the
interpretation of old rows. And of course all rows inserted after the
ADD COLUMN contain explicit values of the column, so their meaning is
unaffected in any case.

You do need two defaults associated with a column to make this work.
The "exists default" never changes after the column is added. But
in principle, the "exists default" just replaces the NULL value that
we implicitly insert now in such cases.
Explained so much better than I could do it :-)

I want to point out as a minor “extension” that there is no need for the default to be immutable. It is merely required that the default is evaluate at time of ADD COLUMN
and then we remember the actual value for the exist default, rather than the parsed expression as we do for the “current” default.

Need a better name for the concept, since evidently this name isn't
conveying the idea.
By all means. Got anything in mind?

For comparison, SQL Server's implementation. They have a similar feature (in their Enterprise only edition).
From https://msdn.microsoft.com/en-us/library/ms190273.aspx :

Adding NOT NULL Columns as an Online Operation

Starting with SQL Server 2012 Enterprise Edition, adding a NOT NULL column with a default value is an online operation when the default value is a runtime constant. This means that the operation is completed almost instantaneously regardless of the number of rows in the table. This is because the existing rows in the table are not updated during the operation; instead, the default value is stored only in the metadata of the table and the value is looked up as needed in queries that access these rows. This behavior is automatic; no additional syntax is required to implement the online operation beyond the ADD COLUMN syntax. A runtime constant is an expression that produces the same value at runtime for each row in the table regardless of its determinism. For example, the constant expression "My temporary data", or the system function GETUTCDATETIME() are runtime constants. In contrast, the functions NEWID() or NEWSEQUENTIALID() are not runtime constants because a unique value is produced for each row in the table. Adding a NOT NULL column with a default value that is not a runtime constant is always performed offline and an exclusive (SCH-M) lock is acquired for the duration of the operation.

While the existing rows reference the value stored in metadata, the default value is stored on the row for any new rows that are inserted and do not specify another value for the column. The default value stored in metadata is moved to an existing row when the row is updated (even if the actual column is not specified in the UPDATE statement), or if the table or clustered index is rebuilt.

Re: Fast AT ADD COLUMN with DEFAULTs

From

Vitaly Burovoy

Date:

05 October 2016, 23:19:37

On 10/5/16, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
>> Need a better name for the concept, since evidently this name isn't
>> conveying the idea.
>
> Maybe "creation default" would work better?  Point being it's the
> default value at the time of column creation.

Hmm... Personaly for me the original topic name is good enough.


But what I discover for myself is that we have pg_attrdef separately
from the pg_attribute. Why?
Is it time to join them? For not presented defaults it would be only
one bit per row(if we avoid "adsrc" as it is recommended), but for a
separate table it is 11 columns with two indexes now...
-- 
Best regards,
Vitaly Burovoy

Re: Fast AT ADD COLUMN with DEFAULTs

From

Serge Rielau

Date:

05 October 2016, 23:39:04

<img class="cloudmagic-smart-beacon" height="0"
src="https://tr.cloudmagic.com/h/v6/emailtag/tag/1475710688/647b813270dffd4c4b2631f3c9eb3f46/9cf3801d03770ada01bb39dc8f52321d/d3d73fe96985e5e4f5ec045c1bb0ba95/9efab2399c7c560b34de477b9aa0a465/ufo.gif"
style="border:0;width:0; height:0; overflow:hidden;" width="0" /><div dir="auto"><span><br /></span><br /><div
class="cm_footer"id="cm_footer"><div id="cm_sent_from">via <a
href="https://cloudmagic.com/k/d/mailapp?ct=dx&cv=9.0.74&pv=10.11.6&source=email_footer_2">Newton
Mail</a></div></div><span><br/></span><div id="cm_replymail_content_wrap"><div class="">On Wed, Oct 5, 2016 at 4:19 PM,
VitalyBurovoy <vitaly.burovoy@gmail.com> wrote:<br /><div id="cm_replymail_content_1475709656" style="overflow:
visible;"><blockquotestyle="margin:0;border-left: #D6D6D6 1px solid;padding-left: 10px;">On 10/5/16, Tom Lane
<tgl@sss.pgh.pa.us>wrote:<br />> I wrote:<br />>> Need a better name for the concept, since evidently
thisname isn't<br />>> conveying the idea.<br />><br />> Maybe "creation default" would work better? Point
beingit's the<br />> default value at the time of column creation.<br />Hmm... Personaly for me the original topic
nameis good enough.</blockquote><div id="ID_1475709663350">I think at issue is with the term “exist default” rather
thanthe feature/topic name (?)</div><div id="ID_1475709663350"> </div><blockquote class="" style="margin: 0px;
border-left-color:rgb(214, 214, 214); border-left-width: 1px; border-left-style: solid; padding-left: 10px;">But what I
discoverfor myself is that we have pg_attrdef separately<br />from the pg_attribute. Why?<br />Is it time to join them?
Fornot presented defaults it would be only<br />one bit per row(if we avoid "adsrc" as it is recommended), but for a<br
/>separatetable it is 11 columns with two indexes now...</blockquote><div id="ID_1475709743037">In terms of footprint
wemay be able to remove pg_attrdef.</div><div id="ID_1475709743037">I would consider that orthogonal to the proposed
featurethough.</div><div id="ID_1475709743037">The internal representation of defaults in the tuple descriptor still
needsto be a map of sorts.</div><div id="ID_1475709743037"><br /></div><div id="ID_1475709743037">To comment on
PantelisSQL Server Reference:</div><div id="ID_1475709743037">Other vendors such as Oracle and DB2 also support this
feature.</div><divid="ID_1475709743037"><br /></div><div id="ID_1475709743037">The listed restriction made me loop back
toVitaly’s original serial example:</div><div id="ID_1475709743037">ALTER TABLE t ADD COLUMN c2 serial;</div><div
id="ID_1475709743037">andrethink Tom’s struct restriction to constants.</div><div id="ID_1475709743037"><br
/></div><divid="ID_1475709743037">In PG the proposed feature would also have to be limited to immutable(?) default
expressionsto comply with existing behavior, which matches SQL Servers.</div><div id="ID_1475709743037"><br
/></div><divid="ID_1475709743037">My current patch does not restrict that and thusly falsely "fills in" the same value
forall rows. </div><div id="ID_1475709743037"><br /></div><div id="ID_1475709743037">Cheers</div><div
id="ID_1475709743037">SergeRielau</div><div id="ID_1475709743037">Salesforce.com</div><div id="ID_1475709743037"><br
/></div></div></div></div></div>

Re: Fast AT ADD COLUMN with DEFAULTs

From

Vitaly Burovoy

Date:

06 October 2016, 00:52:52

On 10/5/16, Serge Rielau <serge@rielau.com> wrote:
>On Wed, Oct 5, 2016 at 4:19 PM, Vitaly Burovoy <vitaly.burovoy@gmail.com> wrote:
>> But what I discover for myself is that we have pg_attrdef separately
>> from the pg_attribute. Why?
>> Is it time to join them? For not presented defaults it would be only
>> one bit per row(if we avoid "adsrc" as it is recommended), but for a
>> separate table it is 11 columns with two indexes now...
>
> In terms of footprint we may be able to remove pg_attrdef.
> I would consider that orthogonal to the proposed feature though.

It was a question mostly to the community rather than to you.

> The internal representation of defaults in the tuple descriptor still needs to be a map of sorts.
>
> To comment on Pantelis SQL Server Reference:
> Other vendors such as Oracle and DB2 also support this feature.
>
> The listed restriction made me loop back to Vitaly’s original serial example:
> ALTER TABLE t ADD COLUMN c2 serial;
> and rethink Tom’s struct restriction to constants.

I'm sorry, the correct example with "now" should be:
ALTER TABLE t ADD COLUMN c1 timestamptz NOT NULL DEFAULT
'now'::text::timestamptz;

I wanted to show that for non-immutable (even stable) expression as a
default one each time selected old tuples gives different result.
But your implementation evaluates it before saving as a
"pre-add-column" default and it breakes the current behavior in a
different way.

> In PG the proposed feature would also have to be limited to immutable(?)
> default expressions to comply with existing behavior, which matches SQL Servers.
>
> My current patch does not restrict that and thusly falsely "fills in" the same value for all rows.

If you change the current behavior (just making it "faster") you
should save the current behavior.
The best case is to determine whether it is possible to do a "fast"
change and fill the "pre-add-column" default parameter or leave the
current "long" behavior.

I assure it is often enough to add columns with a default value which
fills the current rows by non-static values.
One of them is adding a serial column (which is a macros for "CREATE
SEQUENCE+SET NULL+SET DEFAULT
nextval(just_created_sequence_relation)"), others are
"uuid_generate_vX(...)" and "random()"

On 10/5/16, Serge Rielau <serge@rielau.com> wrote:
> I want to point out as a minor "extension" that there is no need for the
> default to be immutable. It is merely required that the default is evaluate
> at time of ADD COLUMN and then we remember the actual value for the exist
> default, rather than the parsed expression as we do for the "current"
> default.

I don't think it will be accepted.

On 10/5/16, Serge Rielau <serge@rielau.com> wrote:
> Thanks.
> This would be my first contribution.
> I take it I would post a patch based on a recent PG 9.6 master for review?

Your patch must be based on a just "master" branch.
If you haven't seen wiki pages [1], [2] and [3], it is the time to do
it (and related ones).

> Or should I compose some sort of a design document?

Since Tom Lane, Andres Freund and other people agreed "it does work",
you may post a patch to a new thread and write a short (but clean
enough) description with a link to the current thread. Examples can be
seen by links from the CF[4].
Nevertheless it is important to honor all thoughts mentioned here
because if your patch breaks the current behavior (with no significant
reason) it is senseless (even if it matches the behavior of other
RDBMS) and will not be committed.

Of cource, feel free to ask.

[1] https://wiki.postgresql.org/wiki/So,_you_want_to_be_a_developer%3F
[2] https://wiki.postgresql.org/wiki/Developer_FAQ
[3] https://wiki.postgresql.org/wiki/Submitting_a_Patch
[4] https://commitfest.postgresql.org/11/

--
Best regards,
Vitaly Burovoy

Re: Fast AT ADD COLUMN with DEFAULTs

From

Serge Rielau

Date:

06 October 2016, 03:44:25

> On Oct 5, 2016, at 5:52 PM, Vitaly Burovoy <vitaly.burovoy@gmail.com> wrote:
>
> On 10/5/16, Serge Rielau <serge@rielau.com> wrote:
>> I want to point out as a minor "extension" that there is no need for the
>> default to be immutable. It is merely required that the default is evaluate
>> at time of ADD COLUMN and then we remember the actual value for the exist
>> default, rather than the parsed expression as we do for the "current"
>> default.
>
> I don't think it will be accepted.
And I wouldn’t expect it to. I had a misunderstanding on what PG did.
Clearly the enhancement must be semantically neutral and be limited to the
cases where that can be asserted.
So my patch will detect that situation and fall back to the original behavior as needed.
> Your patch must be based on a just "master" branch.
> If you haven't seen wiki pages [1], [2] and [3], it is the time to do
> it (and related ones).
>
>> Or should I compose some sort of a design document?
>
> Since Tom Lane, Andres Freund and other people agreed "it does work",
> you may post a patch to a new thread and write a short (but clean
> enough) description with a link to the current thread. Examples can be
> seen by links from the CF[4].
Thanks of rte guidance.
It will take a bit of time to port to community code and complete QA.
I shall return….
Serge Rielau
Salesforce.com

Re: Fast AT ADD COLUMN with DEFAULTs

From

Simon Riggs

Date:

06 October 2016, 11:19:53

On 6 October 2016 at 04:43, Serge Rielau <serge@rielau.com> wrote:

>>> Or should I compose some sort of a design document?

Having read this thread, I'm a little unclear as to what you're
writing now, though there's definitely good ideas here.

I think it would be beneficial to write up a single coherent
description of this, including behaviour and a small sketch of
implementation, just so everyone knows what this is. No design doc,
but a summary.


It would be very useful to be able to do this...
ALTER TABLE foo ADD last_updated_timestamp timestamp default current_timestamp
so that it generates a constant value and stores that for all prior
rows, but then generates a new value for future rows.

Which makes me think we should call this missing_value or absent_value
so its clear that it is not a "default" it is the value we use for
rows that do not have any value stored for them.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Fast AT ADD COLUMN with DEFAULTs

From

Vitaly Burovoy

Date:

06 October 2016, 12:25:33

On 10/6/16, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 6 October 2016 at 04:43, Serge Rielau <serge@rielau.com> wrote:
>>>> Or should I compose some sort of a design document?
>
> Having read this thread, I'm a little unclear as to what you're
> writing now, though there's definitely good ideas here.
>
> I think it would be beneficial to write up a single coherent
> description of this, including behaviour and a small sketch of
> implementation, just so everyone knows what this is. No design doc,
> but a summary.

At the moment I think it can also be a good idea to post the current
patch as a Proposal or a WIP to get initial feedback.

> It would be very useful to be able to do this...
> ALTER TABLE foo ADD last_updated_timestamp timestamp default
> current_timestamp
> so that it generates a constant value and stores that for all prior
> rows, but then generates a new value for future rows.

Yes, it works for stable "now()" but does not work for volatile
functions like "random()", "uuid_generate_v4()" or default for serial
columns. The only possible way I can see is to check an expression has
only "T_Const"s, static and stable functions. In such case the
expression can be evaluated and the result be saved as a value for
absented attributes of a tuple. In the other case save NULL there and
rewrite the table.

> Which makes me think we should call this missing_value or absent_value
> so its clear that it is not a "default" it is the value we use for
> rows that do not have any value stored for them.

It is definitely a default for a user, it is not a regular default internally.
I'm not a native speaker, "absent_value" can be mixed up with a NULL.
As for me the best phrase is "pre-add-column-default", but it is
impossible to use it as a column name. :-(
It is still an open question.

(I remember funny versions in a discussion[1] when people tried to
choose a name for a function reversed to pg_size_pretty...)

[1] https://www.postgresql.org/message-id/flat/CAFj8pRD-tGoDKnxdYgECzA4On01_uRqPrwF-8LdkSE-6bDHp0w@mail.gmail.com

--
Best regards,
Vitaly Burovoy

Re: Fast AT ADD COLUMN with DEFAULTs

From

Serge Rielau

Date:

06 October 2016, 15:18:46

> On Oct 6, 2016, at 5:25 AM, Vitaly Burovoy <vitaly.burovoy@gmail.com> wrote:
>
> On 10/6/16, Simon Riggs <simon@2ndquadrant.com> wrote:
>> On 6 October 2016 at 04:43, Serge Rielau <serge@rielau.com> wrote:
>>>>> Or should I compose some sort of a design document?
>>
>> Having read this thread, I'm a little unclear as to what you're
>> writing now, though there's definitely good ideas here.
>>
>> I think it would be beneficial to write up a single coherent
>> description of this, including behaviour and a small sketch of
>> implementation, just so everyone knows what this is. No design doc,
>> but a summary.
>
> At the moment I think it can also be a good idea to post the current
> patch as a Proposal or a WIP to get initial feedback.
I can do that -  Accompanied by a posting sized overview.

>
> Yes, it works for stable "now()" but does not work for volatile
> functions like "random()", "uuid_generate_v4()" or default for serial
> columns. The only possible way I can see is to check an expression has
> only "T_Const"s, static and stable functions. In such case the
> expression can be evaluated and the result be saved as a value for
> absented attributes of a tuple. In the other case save NULL there and
> rewrite the table.
Agreed. I think DEFAULT as-is does the job nicely function wise.
One can always decompose the ADD COLUMN into two steps within the same transaction
if the initial column value for pre-existing rows does not match the default for new or updated rows.
AT Just needs a performance boost for large tables where that’s reasonably possible.

>> Which makes me think we should call this missing_value or absent_value
>> so its clear that it is not a "default" it is the value we use for
>> rows that do not have any value stored for them.
>
> It is definitely a default for a user, it is not a regular default internally.
> I'm not a native speaker, "absent_value" can be mixed up with a NULL.
> As for me the best phrase is "pre-add-column-default", but it is
> impossible to use it as a column name. :-(
> It is still an open question.
I like Tom’s “creation default”. Another one could be “initial default”.
But that, too, can be misread.

Cheers
Serge Rielau
Salesforce.com

Re: Fast AT ADD COLUMN with DEFAULTs

From

Tom Lane

Date:

06 October 2016, 16:01:22

Serge Rielau <serge@rielau.com> writes:
>> On Oct 6, 2016, at 5:25 AM, Vitaly Burovoy <vitaly.burovoy@gmail.com> wrote:
>>> Which makes me think we should call this missing_value or absent_value
>>> so its clear that it is not a "default" it is the value we use for
>>> rows that do not have any value stored for them.

> I like Tom’s “creation default”. Another one could be “initial default”.
> But that, too, can be misread.

Something based on missing_value/absent_value could work for me too.
If we name it something involving "default", that definitely increases
the possibility for confusion with the regular user-settable default.

Also worth thinking about here is that the regular default expression
affects what will be put into future inserted rows, whereas this thing
affects the interpretation of past rows.  So it's really quite a different
animal.  That's kind of leading me away from calling it creation_default.

BTW, it also occurs to me that there are going to be good implementation
reasons for restricting it to be a hard constant, not any sort of
expression.  We are likely to need to be able to insert the value in
low-level code where general expression evaluation is impractical.
        regards, tom lane

Re: Fast AT ADD COLUMN with DEFAULTs

From

Serge Rielau

Date:

06 October 2016, 16:13:00

> On Oct 6, 2016, at 9:01 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> BTW, it also occurs to me that there are going to be good implementation
> reasons for restricting it to be a hard constant, not any sort of
> expression.  We are likely to need to be able to insert the value in
> low-level code where general expression evaluation is impractical.
>
Yes, the padding must happen primarily in the getAttr() routines.
Clearly we do not want to evaluate an expression there.
But what speaks against evaluating the expression before we store it?
After all we seem to all agree that this only works if the expression computes to the same constant all the time.

If we do not want to store an “untyped” datum straight in pg_attribute as a BYTEA (my current approach) we could store
thepretty printed version of the constant 
and evaluate that when we build the tuple descriptor.
This happens when we load the relation into the relcache.

Anyway, I’m jumping ahead and it’s perhaps best to let the code speak for itself once I have the WIP patch ready so we
havesomething concrete to discuss 

Cheers
Serge

Re: Fast AT ADD COLUMN with DEFAULTs

From

Vitaly Burovoy

Date:

06 October 2016, 16:16:11

On 10/6/16, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Serge Rielau <serge@rielau.com> writes:
>>> On Oct 6, 2016, at 5:25 AM, Vitaly Burovoy <vitaly.burovoy@gmail.com>
>>> wrote:
>>>> Which makes me think we should call this missing_value or absent_value

Be honest Simon Rigg's wrote that words.

>>>> so its clear that it is not a "default" it is the value we use for
>>>> rows that do not have any value stored for them.
>
>> I like Tom’s “creation default”. Another one could be “initial default”.
>> But that, too, can be misread.
>
> Something based on missing_value/absent_value could work for me too.
>
> If we name it something involving "default", that definitely increases
> the possibility for confusion with the regular user-settable default.
>
> Also worth thinking about here is that the regular default expression
> affects what will be put into future inserted rows, whereas this thing
> affects the interpretation of past rows.  So it's really quite a different
> animal.  That's kind of leading me away from calling it creation_default.
>
> BTW, it also occurs to me that there are going to be good implementation
> reasons for restricting it to be a hard constant, not any sort of
> expression.  We are likely to need to be able to insert the value in
> low-level code where general expression evaluation is impractical.

Yes, I mentioned that it should be evaluated and stored as a value
because user functions can be changed (besides the speed reason),
that's why I like the "value" in its name. The "default" is usually
identified with expressions, not values (which are particular cases of
expressions).

Serge mentioned the phrase "pre-existing rows", which makes me think
about something like "pre_existing_value"....

--
Best regards,
Vitaly Burovoy

Re: Fast AT ADD COLUMN with DEFAULTs

From

Tom Lane

Date:

06 October 2016, 16:20:42

Vitaly Burovoy <vitaly.burovoy@gmail.com> writes:
> But what I discover for myself is that we have pg_attrdef separately
> from the pg_attribute. Why?

The core reason for that is that the default expression needs to be
a separate object from the column for purposes of dependency analysis.
For example, if you have a column whose default is "foo()", then the
default expression depends on the function foo(), but the column should
not: if you drop the function, only the default expression ought to
be dropped, not the column.

Because of this, the default expression needs to have its own OID
(to be stored in pg_depend) and it's convenient to store it in a
separate catalog so that the classoid can identify it as being a
default expression rather than some other kind of object.

If we were going to allow these missing_values or creation_defaults
or whatever they're called to be general expressions, then they would need
to have their own OIDs for dependency purposes.  That would lead me to
think that the best representation is to put them in their own rows in
pg_attrdef, perhaps adding a boolean column to pg_attrdef to distinguish
regular defaults from these things.  Or maybe they even need their own
catalog, depending on whether you think dependency analysis would want
to distinguish them from regular defaults using just the classoid.

Now, as I just pointed out in another mail, realistically we're probably
going to restrict the feature to simple constants, which'd mean they will
depend only on the column's type and can never need any dependencies of
their own.  So we could take the shortcut of just storing them in a new
column in pg_attribute.  But maybe that's shortsighted and we'll
eventually wish we'd done them as full-fledged separate objects.

But on the third hand ... once one of these is in place, how could you
drop it separately from the column?  That would amount to a change in the
column's stored data, which is not what one would expect from dropping
a separate object.  So maybe it's senseless to think that these things
could ever be distinct objects.  But that definitely leads to the
conclusion that they're constants and nothing else.
        regards, tom lane

Re: Fast AT ADD COLUMN with DEFAULTs

From

Vitaly Burovoy

Date:

06 October 2016, 16:26:14

On 10/6/16, Serge Rielau <serge@rielau.com> wrote:
>> On Oct 6, 2016, at 9:01 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> BTW, it also occurs to me that there are going to be good implementation
>> reasons for restricting it to be a hard constant, not any sort of
>> expression.  We are likely to need to be able to insert the value in
>> low-level code where general expression evaluation is impractical.
>>
> Yes, the padding must happen primarily in the getAttr() routines.
> Clearly we do not want to evaluate an expression there.
> But what speaks against evaluating the expression before we store it?
> After all we seem to all agree that this only works if the expression
> computes to the same constant all the time.
>
> If we do not want to store an “untyped” datum straight in pg_attribute as a
> BYTEA (my current approach)

Ough. I made a mistake about pg_attribute because I forgot about the pg_attrdef.
If we do not merge these tables, the pg_attrdef is the best place to
store evaluated expression as a constant the same way defaults are
stored in adbin.

> we could store the pretty printed version of the constant

It is a wrong way. It ruins commands like "ALTER COLUMN ... TYPE ... USING"

> and evaluate that when we build the tuple descriptor.
> This happens when we load the relation into the relcache.
>
> Anyway, I’m jumping ahead and it’s perhaps best to let the code speak for
> itself once I have the WIP patch ready so we have something concrete to
> discuss

--
Best regards,
Vitaly Burovoy

Re: Fast AT ADD COLUMN with DEFAULTs

From

Vitaly Burovoy

Date:

06 October 2016, 16:38:55

On 10/6/16, Vitaly Burovoy <vitaly.burovoy@gmail.com> wrote:
> Ough. I made a mistake about pg_attribute because I forgot about the
> pg_attrdef.
> If we do not merge these tables, the pg_attrdef is the best place to
> store evaluated expression as a constant the same way defaults are
> stored in adbin.

Oops. While I was writing the previous email, Tom explained necessity
of the pg_attrdef.
With that explanation it is obvious that I was wrong and a value for
pre-existing rows should be in a new column in the pg_attribute.
All the other thoughts from my previous email stand good.

-- 
Best regards,
Vitaly Burovoy

Re: Fast AT ADD COLUMN with DEFAULTs

From

Serge Rielau

Date:

06 October 2016, 16:39:42

> On Oct 6, 2016, at 9:20 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Vitaly Burovoy <vitaly.burovoy@gmail.com> writes:
>> But what I discover for myself is that we have pg_attrdef separately
>> from the pg_attribute. Why?
>
> The core reason for that is that the default expression needs to be
> a separate object from the column for purposes of dependency analysis.
> For example, if you have a column whose default is "foo()", then the
> default expression depends on the function foo(), but the column should
> not: if you drop the function, only the default expression ought to
> be dropped, not the column.
>
> Because of this, the default expression needs to have its own OID
> (to be stored in pg_depend) and it's convenient to store it in a
> separate catalog so that the classoid can identify it as being a
> default expression rather than some other kind of object.
Good to know.
>
> If we were going to allow these missing_values or creation_defaults
> or whatever they're called to be general expressions, then they would need
> to have their own OIDs for dependency purposes.  That would lead me to
> think that the best representation is to put them in their own rows in
> pg_attrdef, perhaps adding a boolean column to pg_attrdef to distinguish
> regular defaults from these things.  Or maybe they even need their own
> catalog, depending on whether you think dependency analysis would want
> to distinguish them from regular defaults using just the classed.
>
> Now, as I just pointed out in another mail, realistically we're probably
> going to restrict the feature to simple constants, which'd mean they will
> depend only on the column's type and can never need any dependencies of
> their own.  So we could take the shortcut of just storing them in a new
> column in pg_attribute.  But maybe that's shortsighted and we'll
> eventually wish we'd done them as full-fledged separate objects.
>
> But on the third hand ... once one of these is in place, how could you
> drop it separately from the column?  That would amount to a change in the
> column's stored data, which is not what one would expect from dropping
> a separate object.  So maybe it's senseless to think that these things
> could ever be distinct objects.  But that definitely leads to the
> conclusion that they're constants and nothing else.
I cannot follow this reasoning.
Let’s look past what PG does today:
For each row (whether that’s necessary or not) we evaluate the expression, compute the value and
store it in the rewritten table.
We do not record dependencies on the “pedigree” of the value.
It happened to originate from the DEFAULT expression provided with the ADD COLUMN,
but that is not remembered anywhere.
All we remember is the value - in each row.

So the only change that is proposed here - when it comes right down to it - is to
remember the value once only (IFF it is provably the same for each row) and thus
avoid the need to rewrite the table.
So I see no reason to impose any restriction other than “evaluated value is provably the same for every row”.

Regarding the location of storage.
I did start of using pg_attrdef, but ran into some snags.
My approach was to add the value as an extra column (rather than an extra row).
That caused trouble since a SET DEFAULT operation is decomposed into a DROP and a SET and
preserving the value across such operations did not come naturally.

If we were to use extra rows instead that issue would be solved, assuming we ad a “default kind” sort of column.
It would dictate the storage format though which may be considered overkill for a a constant.

Cheers
Serge

Re: Fast AT ADD COLUMN with DEFAULTs

From

Vitaly Burovoy

Date:

06 October 2016, 17:03:32

On 10/6/16, Serge Rielau <serge@rielau.com> wrote:
>> On Oct 6, 2016, at 9:20 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Vitaly Burovoy <vitaly.burovoy@gmail.com> writes:
>>> But what I discover for myself is that we have pg_attrdef separately
>>> from the pg_attribute. Why?
>>
>> The core reason for that is that the default expression needs to be
>> a separate object from the column for purposes of dependency analysis.
>> For example, if you have a column whose default is "foo()", then the
>> default expression depends on the function foo(), but the column should
>> not: if you drop the function, only the default expression ought to
>> be dropped, not the column.
>>
>> Because of this, the default expression needs to have its own OID
>> (to be stored in pg_depend) and it's convenient to store it in a
>> separate catalog so that the classoid can identify it as being a
>> default expression rather than some other kind of object.
>
> Good to know.
>
>> If we were going to allow these missing_values or creation_defaults
>> or whatever they're called to be general expressions, then they would
>> need
>> to have their own OIDs for dependency purposes.  That would lead me to
>> think that the best representation is to put them in their own rows in
>> pg_attrdef, perhaps adding a boolean column to pg_attrdef to distinguish
>> regular defaults from these things.  Or maybe they even need their own
>> catalog, depending on whether you think dependency analysis would want
>> to distinguish them from regular defaults using just the classed.
>>
>> Now, as I just pointed out in another mail, realistically we're probably
>> going to restrict the feature to simple constants, which'd mean they will
>> depend only on the column's type and can never need any dependencies of
>> their own.  So we could take the shortcut of just storing them in a new
>> column in pg_attribute.

I agree with you.

>> But maybe that's shortsighted and we'll
>> eventually wish we'd done them as full-fledged separate objects.

I don't think so. If we try to implement non-blocking adding columns
with volatile defaults (and for instance update old rows in the
background), we can end up with the next situation:

CREATE TABLE a(i bigint PRIMARY KEY);
INSERT INTO a SELECT generate_series(1,10000000000);

ALTER TABLE a ADD COLUMN b bigserial CHECK (b BETWEEN 1 AND 100);

For indexes (even unique) created concurrently similar troubles are
solved with a "not valid" mark, but what to do with a heap if we try
to do it in the background?

>> But on the third hand ... once one of these is in place, how could you
>> drop it separately from the column?  That would amount to a change in the
>> column's stored data, which is not what one would expect from dropping
>> a separate object.  So maybe it's senseless to think that these things
>> could ever be distinct objects.  But that definitely leads to the
>> conclusion that they're constants and nothing else.

> I cannot follow this reasoning.
> Let’s look past what PG does today:
> For each row (whether that’s necessary or not) we evaluate the expression,
> compute the value and
> store it in the rewritten table.
> We do not record dependencies on the “pedigree” of the value.
> It happened to originate from the DEFAULT expression provided with the ADD
> COLUMN,
> but that is not remembered anywhere.
> All we remember is the value - in each row.
> So the only change that is proposed here - when it comes right down to it -
> is to remember the value once only (IFF it is provably the same for each row)
> and thus avoid the need to rewrite the table.
> So I see no reason to impose any restriction other than “evaluated value is
> provably the same for every row”.

Tom says the same thing. The expression at the end should be a value
if it allows to avoid rewriting table.

> Regarding the location of storage.
> I did start of using pg_attrdef, but ran into some snags.
> My approach was to add the value as an extra column (rather than an extra
> row).
> That caused trouble since a SET DEFAULT operation is decomposed into a DROP
> and a SET and
> preserving the value across such operations did not come naturally.

I'm sorry for making you be confused. The best way is to use an extra
column in the pg_attribute to store serialized value.

> If we were to use extra rows instead that issue would be solved, assuming we
> add a “default kind” sort of column.
> It would dictate the storage format though which may be considered overkill
> for a a constant.

--
Best regards,
Vitaly Burovoy

Re: Fast AT ADD COLUMN with DEFAULTs

From

Jim Nasby

Date:

10 October 2016, 03:24:45

On 10/6/16 11:01 AM, Tom Lane wrote:
> Something based on missing_value/absent_value could work for me too.
> If we name it something involving "default", that definitely increases
> the possibility for confusion with the regular user-settable default.
>
> Also worth thinking about here is that the regular default expression
> affects what will be put into future inserted rows, whereas this thing
> affects the interpretation of past rows.  So it's really quite a different
> animal.  That's kind of leading me away from calling it creation_default.

There's actually another use case here that's potentially extremely 
valuable for warehousing and other "big data": compact representation of 
a default value.

The idea here is that if you have a specific value for a field that 
makes up a very large portion of your data, you'd really like to be able 
to represent that value *in each row* with something like a bit (such as 
we currently do for NULLs).

What I'd expect to see in the real world (once users figure this hack 
out) would be:

CREATE TABLE ...(  ...  -- skip field_a so we can handle all it's common values
);

INSERT INTO  ...  SELECT    ...    WHERE field_a IS NOT DISTINCT FROM 'really common value'
;

ALTER TABLE  ADD field_a ... NOT NULL DEFAULT 'really common value'
;

-- load rest of the data

That would have the effect of storing all those really common values 
with a single bit.

What we'd ultimately want is some kind of catalog versioning so that we 
knew what was in place when each tuple was created; that would allow for 
changing these things over time without forcing a full rewrite.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461

Re: Fast AT ADD COLUMN with DEFAULTs

From

Corey Huinker

Date:

10 October 2016, 04:02:43

There's actually another use case here that's potentially extremely valuable for warehousing and other "big data": compact representation of a default value.

I too would benefit from tables having either a default value in the event of a NOT-NULL column being flagged null, or a flat-out constant.

This would be a big win in partitioned tables where the partition can only hold one value of the partitioning column.

I guess a constant would be a pg_type where the sole value is encoded, and the column itself is stored like an empty string.

Re: Fast AT ADD COLUMN with DEFAULTs

From

Jim Nasby

Date:

15 October 2016, 16:10:15

On 10/9/16 11:02 PM, Corey Huinker wrote:
>
>     There's actually another use case here that's potentially extremely
>     valuable for warehousing and other "big data": compact
>     representation of a default value.
>
>
> I too would benefit from tables having either a default value in the
> event of a NOT-NULL column being flagged null, or a flat-out constant.
>
> This would be a big win in partitioned tables where the partition can
> only hold one value of the partitioning column.

I hadn't thought of that use case... with rowcounts in the billions 
becoming pretty common even the cost of a 4 byte enum starts to add up.

> I guess a constant would be a pg_type where the sole value is encoded,
> and the column itself is stored like an empty string.

Not empty string; the storage would look like NULL does today; the 
difference being that we'd know that attribute wasn't NULL-able so if 
it's marked as being "NULL" it actually means it has the default value. 
Though obviously this would only work if the default was a Const, and 
you wouldn't be able to change the default without ensuring no rows in 
the table were using this trick. But I suspect there's still plenty of 
scenarios where the advantage is worth it.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461

Re: Fast AT ADD COLUMN with DEFAULTs

From

Serge Rielau

Date:

15 October 2016, 17:52:04

This feature was added in DB2 year ago.

AFAIK it was not very successful.

Regular compression techniques proved serve a broader and purpose and save more space.

http://www.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.dbobj.doc/doc/c0056482.html

via Newton Mail

On Sat, Oct 15, 2016 at 09:10, Jim Nasby <Jim.Nasby@BlueTreble.com> wrote:

On 10/9/16 11:02 PM, Corey Huinker wrote:
>
> There's actually another use case here that's potentially extremely
> valuable for warehousing and other "big data": compact
> representation of a default value.
>
>
> I too would benefit from tables having either a default value in the
> event of a NOT-NULL column being flagged null, or a flat-out constant.
>
> This would be a big win in partitioned tables where the partition can
> only hold one value of the partitioning column.

I hadn't thought of that use case... with rowcounts in the billions
becoming pretty common even the cost of a 4 byte enum starts to add up.

> I guess a constant would be a pg_type where the sole value is encoded,
> and the column itself is stored like an empty string.

Not empty string; the storage would look like NULL does today; the
difference being that we'd know that attribute wasn't NULL-able so if
it's marked as being "NULL" it actually means it has the default value.
Though obviously this would only work if the default was a Const, and
you wouldn't be able to change the default without ensuring no rows in
the table were using this trick. But I suspect there's still plenty of
scenarios where the advantage is worth it.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532) mobile: 512-569-9461