Thread: Fast AT ADD COLUMN with DEFAULTs

Is there an interest in principle in the community for this functionality?
On 10/5/16, Serge Rielau <serge@rielau.com> wrote: > Dear Hackers, > > I’m working on a patch that expands PG’s ability to add columns to a table > without a table rewrite (i.e. at O(1) cost) from the > nullable-without-default to a more general case. E.g. > > CREATE TABLE T(pk INT NOT NULL PRIMARY KEY); > INSERT INTO T VALEUS (1), (2), (3); > ALTER TABLE T ADD COLUMN c1 INTEGER NOT NULL DEFAULT 5; > INSERT INTO T VALUES (4, DEFAULT); > ALTER TABLE T ALTER COLUMN SET DEFAULT 6; > INSERT INTO T VALUS (5, DEFAULT); > SELECT * FROM T ORDER BY pk; > => > (1, 5), > (2, 5), > (3, 5), > (4, 5), > (5, 6); > > Rows 1-3 have never been updated, yet they know that their values of c1 is > 5. > > The requirement is driven by large tables for which add column takes too > much time and/or produces too large a transaction for comfort. > > In simplified terms: > > * a second “exist default” is computed and stored in > the catalogs at time of AT ADD COLUMN > > * The exist default is cached in the tuple descriptor (e.g in attrdef) > > * When one of the getAttr or copytuple related routines is invoked > the exist default is filled in instead of simply NULL padding if the > tuple is shorter the requested attribute number. > > Is there an interest in principle in the community for this functionality? Wow! I think it would be great! It also solves huge vacuuming after rewriting the table(s). Just pay attention to corner cases like indexes, statistics and speed. But I'd like to see solution for more important cases like: CREATE TABLE t (pk INT NOT NULL PRIMARY KEY); INSERT INTO t VALUES (1), (2), (3); ALTER TABLE t ADD COLUMN c1 timestamptz NOT NULL DEFAULT 'now'; SELECT * FROM t ORDER BY pk; ALTER TABLE t ADD COLUMN c2 serial; SELECT * FROM t ORDER BY pk; INSERT INTO t(pk) VALUES (4); SELECT * FROM t ORDER BY pk; P.S.: I really think it is a good idea, just some research is necessary and covering corner cases... -- Best regards, Vitaly Burovoy
<img class="cloudmagic-smart-beacon" height="0" src="https://tr.cloudmagic.com/h/v6/emailtag/tag/1475704591/647b813270dffd4c4b2631f3c9eb3f46/9cf3801d03770ada01bb39dc8f52321d/eb0a7f22160b433d9b4d35634754e347/9efab2399c7c560b34de477b9aa0a465/ufo.gif" style="border:0;width:0; height:0; overflow:hidden;" width="0" /><div dir="auto"><div id="cm_replymail_content_wrap"><divclass="">On Wed, Oct 5, 2016 at 2:45 PM, Vitaly Burovoy <vitaly.burovoy@gmail.com>wrote:<br /><div id="cm_replymail_content_1475704139" style="overflow: visible;"><blockquotestyle="margin:0;border-left: #D6D6D6 1px solid;padding-left: 10px;">On 10/5/16, Serge Rielau <serge@rielau.com>wrote:<br />> Dear Hackers,<br />><br />> I’m working on a patch that expands PG’s abilityto add columns to a table<br />> without a table rewrite (i.e. at O(1) cost) from the<br />> nullable-without-defaultto a more general case. E.g.<br />...<br />> Is there an interest in principle in the communityfor this functionality?<br /><br />Wow! I think it would be great! It also solves huge vacuuming after<br />rewritingthe table(s).<br />Just pay attention to corner cases like indexes, statistics and speed.</blockquote><div id="ID_1475704192987">Yes, Yes, and still analyzing speed</div><blockquote class="" style="margin: 0px; border-left-color:rgb(214, 214, 214); border-left-width: 1px; border-left-style: solid; padding-left: 10px;"><br />But I'dlike to see solution for more important cases like:<br />CREATE TABLE t (pk INT NOT NULL PRIMARY KEY);<br />INSERT INTOt VALUES (1), (2), (3);<br />ALTER TABLE t ADD COLUMN c1 timestamptz NOT NULL DEFAULT 'now';<br />SELECT * FROM t ORDERBY pk;<br />ALTER TABLE t ADD COLUMN c2 serial;<br />SELECT * FROM t ORDER BY pk;<br />INSERT INTO t(pk) VALUES (4);<br/>SELECT * FROM t ORDER BY pk;</blockquote><div id="ID_1475704277131">By solution I think you mean a semantic changefrom what it is doing today which is:</div><div id="ID_1475704277131">* “Now” is fixed to ALTER TABLE time for allpre-existing rows</div><div id="ID_1475704277131">* serial will fill in the same value for all pre-existing rows</div><divid="ID_1475704277131">Having different semantics for those would require a rewrite and probably different syntaxin some form.</div><div id="ID_1475704277131"><br /></div><div id="ID_1475704277131">This is what my patch does onour PG derivative today: </div><blockquote class="" style="margin: 0px; border-left-color: rgb(214, 214, 214); border-left-width:1px; border-left-style: solid; padding-left: 10px;"></blockquote><div id="ID_1475704251970">CREATE TABLEt (pk INT NOT NULL PRIMARY KEY);</div><div id="ID_1475704251970">CREATE TABLE</div><div id="ID_1475704251970">postgres=#INSERT INTO t VALUES (1), (2), (3);</div><div id="ID_1475704251970">INSERT 0 3</div><divid="ID_1475704251970">postgres=# ALTER TABLE t ADD COLUMN c1 timestamptz NOT NULL DEFAULT 'now';</div><div id="ID_1475704251970">ALTERTABLE</div><div id="ID_1475704251970">postgres=# SELECT * FROM t ORDER BY pk;</div><div id="ID_1475704251970"> pk| c1 </div><div id="ID_1475704251970">----+-------------------------------</div><divid="ID_1475704251970"> 1 | 2016-10-05 21:47:58.919194+00</div><divid="ID_1475704251970"> 2 | 2016-10-05 21:47:58.919194+00</div><div id="ID_1475704251970"> 3| 2016-10-05 21:47:58.919194+00</div><div id="ID_1475704251970">(3 rows)</div><div id="ID_1475704251970"><br /></div><divid="ID_1475704251970">postgres=# </div><div id="ID_1475704251970">postgres=# ALTER TABLE t ADD COLUMN c2 serial;</div><divid="ID_1475704251970">SELECT * FROM t ORDER BY pk;</div><div id="ID_1475704251970">INSERT INTO t(pk) VALUES(4);</div><div id="ID_1475704251970">SELECT * FROM t ORDER BY pk;</div><div id="ID_1475704251970"><br /></div><divid="ID_1475704251970">ALTER TABLE t ADD COLUMN c2 serial;</div><div id="ID_1475704251970">ALTER TABLE</div><divid="ID_1475704251970">postgres=# SELECT * FROM t ORDER BY pk;</div><div id="ID_1475704251970"> pk | c1 | c2 </div><div id="ID_1475704251970">----+-------------------------------+----</div><div id="ID_1475704251970"> 1 | 2016-10-05 21:47:58.919194+00 | 1</div><div id="ID_1475704251970"> 2 | 2016-10-05 21:47:58.919194+00| 1</div><div id="ID_1475704251970"> 3 | 2016-10-05 21:47:58.919194+00 | 1</div><div id="ID_1475704251970">(3rows)</div><div id="ID_1475704251970"><br /></div><div id="ID_1475704251970">postgres=# INSERT INTOt(pk) VALUES (4);</div><div id="ID_1475704251970">INSERT 0 1</div><div id="ID_1475704251970">postgres=# SELECT * FROMt ORDER BY pk;</div><div id="ID_1475704251970"> pk | c1 | c2 </div><div id="ID_1475704251970">----+-------------------------------+----</div><divid="ID_1475704251970"> 1 | 2016-10-05 21:47:58.919194+00| 1</div><div id="ID_1475704251970"> 2 | 2016-10-05 21:47:58.919194+00 | 1</div><div id="ID_1475704251970"> 3 | 2016-10-05 21:47:58.919194+00 | 1</div><div id="ID_1475704251970"> 4 | 2016-10-05 21:47:58.919194+00| 2</div><div id="ID_1475704251970">(4 rows)</div><div id="ID_1475704251970"> </div><blockquote class=""style="margin: 0px; border-left-color: rgb(214, 214, 214); border-left-width: 1px; border-left-style: solid; padding-left:10px;">P.S.: I really think it is a good idea, just some research is<br />necessary and covering corner cases...</blockquote><divid="ID_1475704460956">Thanks.</div><div id="ID_1475704460956">This would be my first contribution. </div><divid="ID_1475704460956">I take it I would post a patch based on a recent PG 9.6 master for review?</div><divid="ID_1475704460956">Or should I compose some sort of a design document? </div><div id="ID_1475704460956"><br/></div><div id="ID_1475704460956">Cheers</div><div id="ID_1475704460956">Serge Rielau</div><divid="ID_1475704460956">Salesforce.com</div></div></div></div></div>
On 2016-10-05 11:58:33 -0700, Serge Rielau wrote: > Dear Hackers, > I’m working on a patch that expands PG’s ability to add columns to a table without a table rewrite (i.e. at O(1) cost)from the nullable-without-default to a more general case. E.g. CREATE TABLE T(pk INT NOT NULL PRIMARY KEY); INSERT INTOT VALEUS (1), (2), (3); ALTER TABLE T ADD COLUMN c1 INTEGER NOT NULL DEFAULT 5; INSERT INTO T VALUES (4, DEFAULT); ALTERTABLE T ALTER COLUMN SET DEFAULT 6; INSERT INTO T VALUS (5, DEFAULT); SELECT * FROM T ORDER BY pk; => (1, 5), (2, 5),(3, 5), (4, 5), (5, 6); > Rows 1-3 have never been updated, yet they know that their values of c1 is 5. > The requirement is driven by large tables for which add column takes too much time and/or produces too large a transactionfor comfort. > In simplified terms: * a second “exist default” is computed and stored in the catalogs at time of AT ADD COLUMN * The existdefault is cached in the tuple descriptor (e.g in attrdef) * When one of the getAttr or copytuple related routines isinvoked the exist default is filled in instead of simply NULL padding if the tuple is shorter the requested attribute number. If I understand this proposal correctly, altering a column default will still have trigger a rewrite unless there's previous default?
On 10/5/16, Andres Freund <andres@anarazel.de> wrote: > On 2016-10-05 11:58:33 -0700, Serge Rielau wrote: >> Dear Hackers, >> I’m working on a patch that expands PG’s ability to add columns to a table >> without a table rewrite (i.e. at O(1) cost) from the >> nullable-without-default to a more general case. > > If I understand this proposal correctly, altering a column default will > still have trigger a rewrite unless there's previous default? No, "a second “exist default"" was mentioned, i.e. it is an additional column in a system table (pg_attribute) as default column values of the "pre-alter" era. It solves changing of the default expression of the same column later. -- Best regards, Vitaly Burovoy
On 2016-10-05 15:23:05 -0700, Vitaly Burovoy wrote: > On 10/5/16, Andres Freund <andres@anarazel.de> wrote: > > On 2016-10-05 11:58:33 -0700, Serge Rielau wrote: > >> Dear Hackers, > >> I’m working on a patch that expands PG’s ability to add columns to a table > >> without a table rewrite (i.e. at O(1) cost) from the > >> nullable-without-default to a more general case. > > > > If I understand this proposal correctly, altering a column default will > > still have trigger a rewrite unless there's previous default? > > No, "a second “exist default"" was mentioned, i.e. it is an additional > column in a system table (pg_attribute) as default column values of > the "pre-alter" era. It solves changing of the default expression of > the same column later. Don't think that actually solves the issue. The default might be unset for a while, for example. Essentially you'd need to be able to associate arbitrary number of default values with an arbitrary set of rows. ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT; INSERT id = 1; ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 1; ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT; INSERT id = 2; ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 2; ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT; INSERT id = 3; ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 3; The result here would be that there's three rows with a default value for foo that's the same as their id. None of them has that column present in the row.
<img class="cloudmagic-smart-beacon" height="0" src="https://tr.cloudmagic.com/h/v6/emailtag/tag/1475706830/647b813270dffd4c4b2631f3c9eb3f46/9cf3801d03770ada01bb39dc8f52321d/c719df4bda2459f77b9ef0f2a251d073/9efab2399c7c560b34de477b9aa0a465/ufo.gif" style="border:0;width:0; height:0; overflow:hidden;" width="0" /><div dir="auto"><div id="cm_replymail_content_wrap"><divclass="">On Wed, Oct 5, 2016 at 3:23 PM, Vitaly Burovoy <vitaly.burovoy@gmail.com>wrote:<br /><div id="cm_replymail_content_1475706734" style="overflow: visible;"><blockquotestyle="margin:0;border-left: #D6D6D6 1px solid;padding-left: 10px;">On 10/5/16, Andres Freund <andres@anarazel.de>wrote:<br />> On 2016-10-05 11:58:33 -0700, Serge Rielau wrote:<br />>> Dear Hackers,<br/>>> I’m working on a patch that expands PG’s ability to add columns to a table<br />>> without atable rewrite (i.e. at O(1) cost) from the<br />>> nullable-without-default to a more general case.<br />> If Iunderstand this proposal correctly, altering a column default will<br />> still have trigger a rewrite unless there'sprevious default?<br />No, "a second “exist default"" was mentioned, i.e. it is an additional<br />column in a systemtable (pg_attribute) as default column values of<br />the "pre-alter" era. It solves changing of the default expressionof<br />the same column later.</blockquote><div id="ID_1475706751433">Correct and good guess on pg_attribute. </div><divid="ID_1475706751433">That’s where it’s living in my proposal. </div><div id="ID_1475706751433"><br/></div><div id="ID_1475706751433">Cheers</div><div id="ID_1475706751433">Serge</div></div></div></div></div>
On 10/5/16, Andres Freund <andres@anarazel.de> wrote: > On 2016-10-05 15:23:05 -0700, Vitaly Burovoy wrote: >> On 10/5/16, Andres Freund <andres@anarazel.de> wrote: >> > On 2016-10-05 11:58:33 -0700, Serge Rielau wrote: >> >> Dear Hackers, >> >> I’m working on a patch that expands PG’s ability to add columns to a >> >> table >> >> without a table rewrite (i.e. at O(1) cost) from the >> >> nullable-without-default to a more general case. >> > >> > If I understand this proposal correctly, altering a column default will >> > still have trigger a rewrite unless there's previous default? >> >> No, "a second “exist default"" was mentioned, i.e. it is an additional >> column in a system table (pg_attribute) as default column values of >> the "pre-alter" era. It solves changing of the default expression of >> the same column later. > > Don't think that actually solves the issue. The default might be unset > for a while, for example. Essentially you'd need to be able to associate > arbitrary number of default values with an arbitrary set of rows. > > ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT; > INSERT id = 1; > ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 1; > ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT; > INSERT id = 2; > ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 2; > ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT; > INSERT id = 3; > ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 3; > > The result here would be that there's three rows with a default value > for foo that's the same as their id. None of them has that column > present in the row. I'm sorry, while I was writting "pre-alter" I meant "pre-alter-add-column" era (not "pre-alter-set-default"), all later default changes "current" default, whereas "pre-alter-add-column" adds value if current column number < TupleDesc.natts. All your DDL are in the "post-alter-add-column" era. -- Best regards, Vitaly Burovoy
On 10/5/16, Vitaly Burovoy <vitaly.burovoy@gmail.com> wrote: > On 10/5/16, Andres Freund <andres@anarazel.de> wrote: >> On 2016-10-05 15:23:05 -0700, Vitaly Burovoy wrote: >>> On 10/5/16, Andres Freund <andres@anarazel.de> wrote: >>> > On 2016-10-05 11:58:33 -0700, Serge Rielau wrote: >>> >> Dear Hackers, >>> >> I’m working on a patch that expands PG’s ability to add columns to a >>> >> table >>> >> without a table rewrite (i.e. at O(1) cost) from the >>> >> nullable-without-default to a more general case. >>> > >>> > If I understand this proposal correctly, altering a column default >>> > will >>> > still have trigger a rewrite unless there's previous default? >>> >>> No, "a second “exist default"" was mentioned, i.e. it is an additional >>> column in a system table (pg_attribute) as default column values of >>> the "pre-alter" era. It solves changing of the default expression of >>> the same column later. >> >> Don't think that actually solves the issue. The default might be unset >> for a while, for example. Essentially you'd need to be able to associate >> arbitrary number of default values with an arbitrary set of rows. >> >> ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT; >> INSERT id = 1; >> ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 1; >> ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT; >> INSERT id = 2; >> ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 2; >> ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT; >> INSERT id = 3; >> ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 3; >> >> The result here would be that there's three rows with a default value >> for foo that's the same as their id. None of them has that column >> present in the row. > > I'm sorry, while I was writting "pre-alter" I meant > "pre-alter-add-column" era (not "pre-alter-set-default"), all later > default changes "current" default, whereas "pre-alter-add-column" adds > value if current column number < TupleDesc.natts. > > All your DDL are in the "post-alter-add-column" era. I'm so sorry, I was in a hurry. Of course, - if current column number < TupleDesc.natts. + if current column number > TupleDesc.natts. -- Best regards, Vitaly Burovoy
On 2016-10-05 15:23:05 -0700, Vitaly Burovoy wrote:
> On 10/5/16, Andres Freund <andres@anarazel.de> wrote:
> > On 2016-10-05 11:58:33 -0700, Serge Rielau wrote:
> >> Dear Hackers,
> >> I’m working on a patch that expands PG’s ability to add columns to a table
> >> without a table rewrite (i.e. at O(1) cost) from the
> >> nullable-without-default to a more general case.
> >
> > If I understand this proposal correctly, altering a column default will
> > still have trigger a rewrite unless there's previous default?
>
> No, "a second “exist default"" was mentioned, i.e. it is an additional
> column in a system table (pg_attribute) as default column values of
> the "pre-alter" era. It solves changing of the default expression of
> the same column later.
Don't think that actually solves the issue. The default might be unset
for a while, for example. Essentially you'd need to be able to associate
arbitrary number of default values with an arbitrary set of rows.
ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
INSERT id = 1;
ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 1;
ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
INSERT id = 2;
ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 2;
ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
INSERT id = 3;
ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 3;
The result here would be that there's three rows with a default value
for foo that's the same as their id. None of them has that column
present in the row.
On 2016-10-05 15:44:56 -0700, Jeff Janes wrote: > On Wed, Oct 5, 2016 at 3:29 PM, Andres Freund <andres@anarazel.de> wrote: > > > On 2016-10-05 15:23:05 -0700, Vitaly Burovoy wrote: > > > On 10/5/16, Andres Freund <andres@anarazel.de> wrote: > > > > On 2016-10-05 11:58:33 -0700, Serge Rielau wrote: > > > >> Dear Hackers, > > > >> I’m working on a patch that expands PG’s ability to add columns to a > > table > > > >> without a table rewrite (i.e. at O(1) cost) from the > > > >> nullable-without-default to a more general case. > > > > > > > > If I understand this proposal correctly, altering a column default will > > > > still have trigger a rewrite unless there's previous default? > > > > > > No, "a second “exist default"" was mentioned, i.e. it is an additional > > > column in a system table (pg_attribute) as default column values of > > > the "pre-alter" era. It solves changing of the default expression of > > > the same column later. > > > > Don't think that actually solves the issue. The default might be unset > > for a while, for example. Essentially you'd need to be able to associate > > arbitrary number of default values with an arbitrary set of rows. > > > > > > ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT; > > INSERT id = 1; > > ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 1; > > ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT; > > INSERT id = 2; > > ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 2; > > ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT; > > INSERT id = 3; > > ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 3; > > > > The result here would be that there's three rows with a default value > > for foo that's the same as their id. None of them has that column > > present in the row. > > > > My understanding is that all of those would be materialized. But that'd require a table rewrite, as none of the above INSERTs were done when a default was in place. But each has a different "applicable" default value. > The only > default that isn't materialized is the one in effect in the same statement > in which that column was added. Since a column can only be added once, the > default in effect at the time the column was added can never change, no > matter what you do to the default later on. DROP DEFAULT pretty much does that, because it allows multiple (set of) rows with no value (or a NULL) for a specific column, but with differing applicable default values. Andres
Andres Freund <andres@anarazel.de> writes: > On 2016-10-05 15:44:56 -0700, Jeff Janes wrote: >>> No, "a second “exist default"" was mentioned, i.e. it is an additional >>> column in a system table (pg_attribute) as default column values of >>> the "pre-alter" era. It solves changing of the default expression of >>> the same column later. > Don't think that actually solves the issue. The default might be unset > for a while, for example. Essentially you'd need to be able to associate > arbitrary number of default values with an arbitrary set of rows. I think it does work, as long as the "exists default" is immutable. (For safety, personally, I'd restrict it to be a verbatim constant.) The point is that you apply that when you are reading a row that has so few columns that it must predate the original ALTER TABLE ADD COLUMN. Subsequent ADD/DROP/SET DEFAULT affect the "active default" and hence insertions that happen after them, but they don't affect the interpretation of old rows. And of course all rows inserted after the ADD COLUMN contain explicit values of the column, so their meaning is unaffected in any case. You do need two defaults associated with a column to make this work. The "exists default" never changes after the column is added. But in principle, the "exists default" just replaces the NULL value that we implicitly insert now in such cases. Need a better name for the concept, since evidently this name isn't conveying the idea. regards, tom lane
On Wed, Oct 5, 2016 at 3:29 PM, Andres Freund <andres@anarazel.de> wrote:On 2016-10-05 15:23:05 -0700, Vitaly Burovoy wrote:
> On 10/5/16, Andres Freund <andres@anarazel.de> wrote:
> > On 2016-10-05 11:58:33 -0700, Serge Rielau wrote:
> >> Dear Hackers,
> >> I’m working on a patch that expands PG’s ability to add columns to a table
> >> without a table rewrite (i.e. at O(1) cost) from the
> >> nullable-without-default to a more general case.
> >
> > If I understand this proposal correctly, altering a column default will
> > still have trigger a rewrite unless there's previous default?
>
> No, "a second “exist default"" was mentioned, i.e. it is an additional
> column in a system table (pg_attribute) as default column values of
> the "pre-alter" era. It solves changing of the default expression of
> the same column later.
Don't think that actually solves the issue. The default might be unset
for a while, for example. Essentially you'd need to be able to associate
arbitrary number of default values with an arbitrary set of rows.
ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
INSERT id = 1;
ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 1;
ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
INSERT id = 2;
ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 2;
ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
INSERT id = 3;
ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 3;
The result here would be that there's three rows with a default value
for foo that's the same as their id. None of them has that column
present in the row.My understanding is that all of those would be materialized. The only default that isn't materialized is the one in effect in the same statement in which that column was added. Since a column can only be added once, the default in effect at the time the column was added can never change, no matter what you do to the default later on.Cheers,Jeff
On 10/5/16, Andres Freund <andres@anarazel.de> wrote: > On 2016-10-05 15:44:56 -0700, Jeff Janes wrote: >> On Wed, Oct 5, 2016 at 3:29 PM, Andres Freund <andres@anarazel.de> wrote: >> > ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT; >> > INSERT id = 1; >> > ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 1; >> > ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT; >> > INSERT id = 2; >> > ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 2; >> > ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT; >> > INSERT id = 3; >> > ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 3; >> > >> > The result here would be that there's three rows with a default value >> > for foo that's the same as their id. None of them has that column >> > present in the row. >> > >> >> My understanding is that all of those would be materialized. > > But that'd require a table rewrite, as none of the above INSERTs were > done when a default was in place. Since they did not have the default value, that tuples are written with actual TupleDesc.natts where att_isnull for "withdefault" column is set (actually the column does not have default for inserted tuples in your case). > But each has a different "applicable" default value. No, their values are constructed "from scratch", not fetched from a heap, so "pre-alter-add-column" default is not applicable for them. >> The only >> default that isn't materialized is the one in effect in the same >> statement >> in which that column was added. Since a column can only be added once, >> the >> default in effect at the time the column was added can never change, no >> matter what you do to the default later on. > > DROP DEFAULT pretty much does that, because it allows multiple (set of) > rows with no value (or a NULL) for a specific column, but with differing > applicable default values. DROP DEFAULT is for "post-alter-add-column" tuples, it does not affects "pre-alter-add-column" ones. -- Best regards, Vitaly Burovoy
On 2016-10-05 18:58:47 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2016-10-05 15:44:56 -0700, Jeff Janes wrote: > >>> No, "a second “exist default"" was mentioned, i.e. it is an additional > >>> column in a system table (pg_attribute) as default column values of > >>> the "pre-alter" era. It solves changing of the default expression of > >>> the same column later. > > > Don't think that actually solves the issue. The default might be unset > > for a while, for example. Essentially you'd need to be able to associate > > arbitrary number of default values with an arbitrary set of rows. > > I think it does work, as long as the "exists default" is immutable. > (For safety, personally, I'd restrict it to be a verbatim constant.) > The point is that you apply that when you are reading a row that has > so few columns that it must predate the original ALTER TABLE ADD COLUMN. > Subsequent ADD/DROP/SET DEFAULT affect the "active default" and hence > insertions that happen after them, but they don't affect the > interpretation of old rows. And of course all rows inserted after the > ADD COLUMN contain explicit values of the column, so their meaning is > unaffected in any case. Err, yes. I forgot that altering the default of an existing column doesn't set the default for existing values. Sorry for the noise. Greetings, Andres Freund
I wrote: > Need a better name for the concept, since evidently this name isn't > conveying the idea. Maybe "creation default" would work better? Point being it's the default value at the time of column creation. regards, tom lane
<img class="cloudmagic-smart-beacon" height="0" src="https://tr.cloudmagic.com/h/v6/emailtag/tag/1475708722/647b813270dffd4c4b2631f3c9eb3f46/9cf3801d03770ada01bb39dc8f52321d/c143e7051e30d22af27905a24303de11/9efab2399c7c560b34de477b9aa0a465/ufo.gif" style="border:0;width:0; height:0; overflow:hidden;" width="0" /><div dir="auto"><br /><div class="cm_footer" id="cm_footer"><divid="cm_sent_from">via <a href="https://cloudmagic.com/k/d/mailapp?ct=dx&cv=9.0.74&pv=10.11.6&source=email_footer_2">Newton Mail</a></div></div><span><br/></span><div id="cm_replymail_content_wrap"><div class="">On Wed, Oct 5, 2016 at 3:58 PM, TomLane <tgl@sss.pgh.pa.us> wrote:<br /><div id="cm_replymail_content_1475708366" style="overflow: visible;"><blockquotestyle="margin:0;border-left: #D6D6D6 1px solid;padding-left: 10px;">Andres Freund <andres@anarazel.de>writes:<br />> On 2016-10-05 15:44:56 -0700, Jeff Janes wrote:<br />>>> No, "a second“exist default"" was mentioned, i.e. it is an additional<br />>>> column in a system table (pg_attribute)as default column values of<br />>>> the "pre-alter" era. It solves changing of the default expressionof<br />>>> the same column later.<br /><br />> Don't think that actually solves the issue. The defaultmight be unset<br />> for a while, for example. Essentially you'd need to be able to associate<br />> arbitrarynumber of default values with an arbitrary set of rows.<br /><br />I think it does work, as long as the "existsdefault" is immutable.<br />(For safety, personally, I'd restrict it to be a verbatim constant.)<br />The point isthat you apply that when you are reading a row that has<br />so few columns that it must predate the original ALTER TABLEADD COLUMN.<br />Subsequent ADD/DROP/SET DEFAULT affect the "active default" and hence<br />insertions that happen afterthem, but they don't affect the<br />interpretation of old rows. And of course all rows inserted after the<br />ADDCOLUMN contain explicit values of the column, so their meaning is<br />unaffected in any case.</blockquote><blockquoteclass="" style="margin: 0px; border-left-color: rgb(214, 214, 214); border-left-width: 1px;border-left-style: solid; padding-left: 10px;"><br />You do need two defaults associated with a column to make this work.<br/>The "exists default" never changes after the column is added. But<br />in principle, the "exists default" justreplaces the NULL value that<br />we implicitly insert now in such cases.</blockquote><div id="ID_1475708435309">Explainedso much better than I could do it :-)</div><div id="ID_1475708435309"><br /></div><div id="ID_1475708435309">Iwant to point out as a minor “extension” that there is no need for the default to be immutable. Itis merely required that the default is evaluate at time of ADD COLUMN</div><div id="ID_1475708435309">and then we rememberthe actual value for the exist default, rather than the parsed expression as we do for the “current” default.</div><divid="ID_1475708435309"> </div><blockquote class="" style="margin: 0px; border-left-color: rgb(214, 214,214); border-left-width: 1px; border-left-style: solid; padding-left: 10px;">Need a better name for the concept, sinceevidently this name isn't<br />conveying the idea.</blockquote><div id="ID_1475708551097">By all means. Got anythingin mind?</div><div id="ID_1475708551097"><br /></div><div id="ID_1475708551097">Cheers</div><div id="ID_1475708551097">SergeRielau</div><div id="ID_1475708551097"><br /></div></div></div></div></div>
via Newton MailOn Wed, Oct 5, 2016 at 3:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:Andres Freund <andres@anarazel.de> writes:
> On 2016-10-05 15:44:56 -0700, Jeff Janes wrote:
>>> No, "a second “exist default"" was mentioned, i.e. it is an additional
>>> column in a system table (pg_attribute) as default column values of
>>> the "pre-alter" era. It solves changing of the default expression of
>>> the same column later.
> Don't think that actually solves the issue. The default might be unset
> for a while, for example. Essentially you'd need to be able to associate
> arbitrary number of default values with an arbitrary set of rows.
I think it does work, as long as the "exists default" is immutable.
(For safety, personally, I'd restrict it to be a verbatim constant.)
The point is that you apply that when you are reading a row that has
so few columns that it must predate the original ALTER TABLE ADD COLUMN.
Subsequent ADD/DROP/SET DEFAULT affect the "active default" and hence
insertions that happen after them, but they don't affect the
interpretation of old rows. And of course all rows inserted after the
ADD COLUMN contain explicit values of the column, so their meaning is
unaffected in any case.
You do need two defaults associated with a column to make this work.
The "exists default" never changes after the column is added. But
in principle, the "exists default" just replaces the NULL value that
we implicitly insert now in such cases.Explained so much better than I could do it :-)I want to point out as a minor “extension” that there is no need for the default to be immutable. It is merely required that the default is evaluate at time of ADD COLUMNand then we remember the actual value for the exist default, rather than the parsed expression as we do for the “current” default.Need a better name for the concept, since evidently this name isn't
conveying the idea.By all means. Got anything in mind?
From https://msdn.microsoft.com/en-us/library/ms190273.aspx :
Adding NOT NULL Columns as an Online Operation
Starting with SQL Server 2012 Enterprise Edition, adding a NOT NULL column with a default value is an online operation when the default value is a runtime constant. This means that the operation is completed almost instantaneously regardless of the number of rows in the table. This is because the existing rows in the table are not updated during the operation; instead, the default value is stored only in the metadata of the table and the value is looked up as needed in queries that access these rows. This behavior is automatic; no additional syntax is required to implement the online operation beyond the ADD COLUMN syntax. A runtime constant is an expression that produces the same value at runtime for each row in the table regardless of its determinism. For example, the constant expression "My temporary data", or the system function GETUTCDATETIME() are runtime constants. In contrast, the functions NEWID() or NEWSEQUENTIALID() are not runtime constants because a unique value is produced for each row in the table. Adding a NOT NULL column with a default value that is not a runtime constant is always performed offline and an exclusive (SCH-M) lock is acquired for the duration of the operation.
While the existing rows reference the value stored in metadata, the default value is stored on the row for any new rows that are inserted and do not specify another value for the column. The default value stored in metadata is moved to an existing row when the row is updated (even if the actual column is not specified in the UPDATE statement), or if the table or clustered index is rebuilt.
On 10/5/16, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I wrote: >> Need a better name for the concept, since evidently this name isn't >> conveying the idea. > > Maybe "creation default" would work better? Point being it's the > default value at the time of column creation. Hmm... Personaly for me the original topic name is good enough. But what I discover for myself is that we have pg_attrdef separately from the pg_attribute. Why? Is it time to join them? For not presented defaults it would be only one bit per row(if we avoid "adsrc" as it is recommended), but for a separate table it is 11 columns with two indexes now... -- Best regards, Vitaly Burovoy
<img class="cloudmagic-smart-beacon" height="0" src="https://tr.cloudmagic.com/h/v6/emailtag/tag/1475710688/647b813270dffd4c4b2631f3c9eb3f46/9cf3801d03770ada01bb39dc8f52321d/d3d73fe96985e5e4f5ec045c1bb0ba95/9efab2399c7c560b34de477b9aa0a465/ufo.gif" style="border:0;width:0; height:0; overflow:hidden;" width="0" /><div dir="auto"><span><br /></span><br /><div class="cm_footer"id="cm_footer"><div id="cm_sent_from">via <a href="https://cloudmagic.com/k/d/mailapp?ct=dx&cv=9.0.74&pv=10.11.6&source=email_footer_2">Newton Mail</a></div></div><span><br/></span><div id="cm_replymail_content_wrap"><div class="">On Wed, Oct 5, 2016 at 4:19 PM, VitalyBurovoy <vitaly.burovoy@gmail.com> wrote:<br /><div id="cm_replymail_content_1475709656" style="overflow: visible;"><blockquotestyle="margin:0;border-left: #D6D6D6 1px solid;padding-left: 10px;">On 10/5/16, Tom Lane <tgl@sss.pgh.pa.us>wrote:<br />> I wrote:<br />>> Need a better name for the concept, since evidently thisname isn't<br />>> conveying the idea.<br />><br />> Maybe "creation default" would work better? Point beingit's the<br />> default value at the time of column creation.<br />Hmm... Personaly for me the original topic nameis good enough.</blockquote><div id="ID_1475709663350">I think at issue is with the term “exist default” rather thanthe feature/topic name (?)</div><div id="ID_1475709663350"> </div><blockquote class="" style="margin: 0px; border-left-color:rgb(214, 214, 214); border-left-width: 1px; border-left-style: solid; padding-left: 10px;">But what I discoverfor myself is that we have pg_attrdef separately<br />from the pg_attribute. Why?<br />Is it time to join them? Fornot presented defaults it would be only<br />one bit per row(if we avoid "adsrc" as it is recommended), but for a<br />separatetable it is 11 columns with two indexes now...</blockquote><div id="ID_1475709743037">In terms of footprint wemay be able to remove pg_attrdef.</div><div id="ID_1475709743037">I would consider that orthogonal to the proposed featurethough.</div><div id="ID_1475709743037">The internal representation of defaults in the tuple descriptor still needsto be a map of sorts.</div><div id="ID_1475709743037"><br /></div><div id="ID_1475709743037">To comment on PantelisSQL Server Reference:</div><div id="ID_1475709743037">Other vendors such as Oracle and DB2 also support this feature.</div><divid="ID_1475709743037"><br /></div><div id="ID_1475709743037">The listed restriction made me loop back toVitaly’s original serial example:</div><div id="ID_1475709743037">ALTER TABLE t ADD COLUMN c2 serial;</div><div id="ID_1475709743037">andrethink Tom’s struct restriction to constants.</div><div id="ID_1475709743037"><br /></div><divid="ID_1475709743037">In PG the proposed feature would also have to be limited to immutable(?) default expressionsto comply with existing behavior, which matches SQL Servers.</div><div id="ID_1475709743037"><br /></div><divid="ID_1475709743037">My current patch does not restrict that and thusly falsely "fills in" the same value forall rows. </div><div id="ID_1475709743037"><br /></div><div id="ID_1475709743037">Cheers</div><div id="ID_1475709743037">SergeRielau</div><div id="ID_1475709743037">Salesforce.com</div><div id="ID_1475709743037"><br /></div></div></div></div></div>
On 10/5/16, Serge Rielau <serge@rielau.com> wrote: >On Wed, Oct 5, 2016 at 4:19 PM, Vitaly Burovoy <vitaly.burovoy@gmail.com> wrote: >> But what I discover for myself is that we have pg_attrdef separately >> from the pg_attribute. Why? >> Is it time to join them? For not presented defaults it would be only >> one bit per row(if we avoid "adsrc" as it is recommended), but for a >> separate table it is 11 columns with two indexes now... > > In terms of footprint we may be able to remove pg_attrdef. > I would consider that orthogonal to the proposed feature though. It was a question mostly to the community rather than to you. > The internal representation of defaults in the tuple descriptor still needs to be a map of sorts. > > To comment on Pantelis SQL Server Reference: > Other vendors such as Oracle and DB2 also support this feature. > > The listed restriction made me loop back to Vitaly’s original serial example: > ALTER TABLE t ADD COLUMN c2 serial; > and rethink Tom’s struct restriction to constants. I'm sorry, the correct example with "now" should be: ALTER TABLE t ADD COLUMN c1 timestamptz NOT NULL DEFAULT 'now'::text::timestamptz; I wanted to show that for non-immutable (even stable) expression as a default one each time selected old tuples gives different result. But your implementation evaluates it before saving as a "pre-add-column" default and it breakes the current behavior in a different way. > In PG the proposed feature would also have to be limited to immutable(?) > default expressions to comply with existing behavior, which matches SQL Servers. > > My current patch does not restrict that and thusly falsely "fills in" the same value for all rows. If you change the current behavior (just making it "faster") you should save the current behavior. The best case is to determine whether it is possible to do a "fast" change and fill the "pre-add-column" default parameter or leave the current "long" behavior. I assure it is often enough to add columns with a default value which fills the current rows by non-static values. One of them is adding a serial column (which is a macros for "CREATE SEQUENCE+SET NULL+SET DEFAULT nextval(just_created_sequence_relation)"), others are "uuid_generate_vX(...)" and "random()" On 10/5/16, Serge Rielau <serge@rielau.com> wrote: > I want to point out as a minor "extension" that there is no need for the > default to be immutable. It is merely required that the default is evaluate > at time of ADD COLUMN and then we remember the actual value for the exist > default, rather than the parsed expression as we do for the "current" > default. I don't think it will be accepted. On 10/5/16, Serge Rielau <serge@rielau.com> wrote: > Thanks. > This would be my first contribution. > I take it I would post a patch based on a recent PG 9.6 master for review? Your patch must be based on a just "master" branch. If you haven't seen wiki pages [1], [2] and [3], it is the time to do it (and related ones). > Or should I compose some sort of a design document? Since Tom Lane, Andres Freund and other people agreed "it does work", you may post a patch to a new thread and write a short (but clean enough) description with a link to the current thread. Examples can be seen by links from the CF[4]. Nevertheless it is important to honor all thoughts mentioned here because if your patch breaks the current behavior (with no significant reason) it is senseless (even if it matches the behavior of other RDBMS) and will not be committed. Of cource, feel free to ask. [1] https://wiki.postgresql.org/wiki/So,_you_want_to_be_a_developer%3F [2] https://wiki.postgresql.org/wiki/Developer_FAQ [3] https://wiki.postgresql.org/wiki/Submitting_a_Patch [4] https://commitfest.postgresql.org/11/ -- Best regards, Vitaly Burovoy
> On Oct 5, 2016, at 5:52 PM, Vitaly Burovoy <vitaly.burovoy@gmail.com> wrote: > > On 10/5/16, Serge Rielau <serge@rielau.com> wrote: >> I want to point out as a minor "extension" that there is no need for the >> default to be immutable. It is merely required that the default is evaluate >> at time of ADD COLUMN and then we remember the actual value for the exist >> default, rather than the parsed expression as we do for the "current" >> default. > > I don't think it will be accepted. And I wouldn’t expect it to. I had a misunderstanding on what PG did. Clearly the enhancement must be semantically neutral and be limited to the cases where that can be asserted. So my patch will detect that situation and fall back to the original behavior as needed. > Your patch must be based on a just "master" branch. > If you haven't seen wiki pages [1], [2] and [3], it is the time to do > it (and related ones). > >> Or should I compose some sort of a design document? > > Since Tom Lane, Andres Freund and other people agreed "it does work", > you may post a patch to a new thread and write a short (but clean > enough) description with a link to the current thread. Examples can be > seen by links from the CF[4]. Thanks of rte guidance. It will take a bit of time to port to community code and complete QA. I shall return…. Serge Rielau Salesforce.com
On 6 October 2016 at 04:43, Serge Rielau <serge@rielau.com> wrote: >>> Or should I compose some sort of a design document? Having read this thread, I'm a little unclear as to what you're writing now, though there's definitely good ideas here. I think it would be beneficial to write up a single coherent description of this, including behaviour and a small sketch of implementation, just so everyone knows what this is. No design doc, but a summary. It would be very useful to be able to do this... ALTER TABLE foo ADD last_updated_timestamp timestamp default current_timestamp so that it generates a constant value and stores that for all prior rows, but then generates a new value for future rows. Which makes me think we should call this missing_value or absent_value so its clear that it is not a "default" it is the value we use for rows that do not have any value stored for them. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 10/6/16, Simon Riggs <simon@2ndquadrant.com> wrote: > On 6 October 2016 at 04:43, Serge Rielau <serge@rielau.com> wrote: >>>> Or should I compose some sort of a design document? > > Having read this thread, I'm a little unclear as to what you're > writing now, though there's definitely good ideas here. > > I think it would be beneficial to write up a single coherent > description of this, including behaviour and a small sketch of > implementation, just so everyone knows what this is. No design doc, > but a summary. At the moment I think it can also be a good idea to post the current patch as a Proposal or a WIP to get initial feedback. > It would be very useful to be able to do this... > ALTER TABLE foo ADD last_updated_timestamp timestamp default > current_timestamp > so that it generates a constant value and stores that for all prior > rows, but then generates a new value for future rows. Yes, it works for stable "now()" but does not work for volatile functions like "random()", "uuid_generate_v4()" or default for serial columns. The only possible way I can see is to check an expression has only "T_Const"s, static and stable functions. In such case the expression can be evaluated and the result be saved as a value for absented attributes of a tuple. In the other case save NULL there and rewrite the table. > Which makes me think we should call this missing_value or absent_value > so its clear that it is not a "default" it is the value we use for > rows that do not have any value stored for them. It is definitely a default for a user, it is not a regular default internally. I'm not a native speaker, "absent_value" can be mixed up with a NULL. As for me the best phrase is "pre-add-column-default", but it is impossible to use it as a column name. :-( It is still an open question. (I remember funny versions in a discussion[1] when people tried to choose a name for a function reversed to pg_size_pretty...) [1] https://www.postgresql.org/message-id/flat/CAFj8pRD-tGoDKnxdYgECzA4On01_uRqPrwF-8LdkSE-6bDHp0w@mail.gmail.com -- Best regards, Vitaly Burovoy
> On Oct 6, 2016, at 5:25 AM, Vitaly Burovoy <vitaly.burovoy@gmail.com> wrote: > > On 10/6/16, Simon Riggs <simon@2ndquadrant.com> wrote: >> On 6 October 2016 at 04:43, Serge Rielau <serge@rielau.com> wrote: >>>>> Or should I compose some sort of a design document? >> >> Having read this thread, I'm a little unclear as to what you're >> writing now, though there's definitely good ideas here. >> >> I think it would be beneficial to write up a single coherent >> description of this, including behaviour and a small sketch of >> implementation, just so everyone knows what this is. No design doc, >> but a summary. > > At the moment I think it can also be a good idea to post the current > patch as a Proposal or a WIP to get initial feedback. I can do that - Accompanied by a posting sized overview. > > Yes, it works for stable "now()" but does not work for volatile > functions like "random()", "uuid_generate_v4()" or default for serial > columns. The only possible way I can see is to check an expression has > only "T_Const"s, static and stable functions. In such case the > expression can be evaluated and the result be saved as a value for > absented attributes of a tuple. In the other case save NULL there and > rewrite the table. Agreed. I think DEFAULT as-is does the job nicely function wise. One can always decompose the ADD COLUMN into two steps within the same transaction if the initial column value for pre-existing rows does not match the default for new or updated rows. AT Just needs a performance boost for large tables where that’s reasonably possible. >> Which makes me think we should call this missing_value or absent_value >> so its clear that it is not a "default" it is the value we use for >> rows that do not have any value stored for them. > > It is definitely a default for a user, it is not a regular default internally. > I'm not a native speaker, "absent_value" can be mixed up with a NULL. > As for me the best phrase is "pre-add-column-default", but it is > impossible to use it as a column name. :-( > It is still an open question. I like Tom’s “creation default”. Another one could be “initial default”. But that, too, can be misread. Cheers Serge Rielau Salesforce.com
Serge Rielau <serge@rielau.com> writes: >> On Oct 6, 2016, at 5:25 AM, Vitaly Burovoy <vitaly.burovoy@gmail.com> wrote: >>> Which makes me think we should call this missing_value or absent_value >>> so its clear that it is not a "default" it is the value we use for >>> rows that do not have any value stored for them. > I like Tom’s “creation default”. Another one could be “initial default”. > But that, too, can be misread. Something based on missing_value/absent_value could work for me too. If we name it something involving "default", that definitely increases the possibility for confusion with the regular user-settable default. Also worth thinking about here is that the regular default expression affects what will be put into future inserted rows, whereas this thing affects the interpretation of past rows. So it's really quite a different animal. That's kind of leading me away from calling it creation_default. BTW, it also occurs to me that there are going to be good implementation reasons for restricting it to be a hard constant, not any sort of expression. We are likely to need to be able to insert the value in low-level code where general expression evaluation is impractical. regards, tom lane
> On Oct 6, 2016, at 9:01 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > BTW, it also occurs to me that there are going to be good implementation > reasons for restricting it to be a hard constant, not any sort of > expression. We are likely to need to be able to insert the value in > low-level code where general expression evaluation is impractical. > Yes, the padding must happen primarily in the getAttr() routines. Clearly we do not want to evaluate an expression there. But what speaks against evaluating the expression before we store it? After all we seem to all agree that this only works if the expression computes to the same constant all the time. If we do not want to store an “untyped” datum straight in pg_attribute as a BYTEA (my current approach) we could store thepretty printed version of the constant and evaluate that when we build the tuple descriptor. This happens when we load the relation into the relcache. Anyway, I’m jumping ahead and it’s perhaps best to let the code speak for itself once I have the WIP patch ready so we havesomething concrete to discuss Cheers Serge
On 10/6/16, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Serge Rielau <serge@rielau.com> writes: >>> On Oct 6, 2016, at 5:25 AM, Vitaly Burovoy <vitaly.burovoy@gmail.com> >>> wrote: >>>> Which makes me think we should call this missing_value or absent_value Be honest Simon Rigg's wrote that words. >>>> so its clear that it is not a "default" it is the value we use for >>>> rows that do not have any value stored for them. > >> I like Tom’s “creation default”. Another one could be “initial default”. >> But that, too, can be misread. > > Something based on missing_value/absent_value could work for me too. > > If we name it something involving "default", that definitely increases > the possibility for confusion with the regular user-settable default. > > Also worth thinking about here is that the regular default expression > affects what will be put into future inserted rows, whereas this thing > affects the interpretation of past rows. So it's really quite a different > animal. That's kind of leading me away from calling it creation_default. > > BTW, it also occurs to me that there are going to be good implementation > reasons for restricting it to be a hard constant, not any sort of > expression. We are likely to need to be able to insert the value in > low-level code where general expression evaluation is impractical. Yes, I mentioned that it should be evaluated and stored as a value because user functions can be changed (besides the speed reason), that's why I like the "value" in its name. The "default" is usually identified with expressions, not values (which are particular cases of expressions). Serge mentioned the phrase "pre-existing rows", which makes me think about something like "pre_existing_value".... -- Best regards, Vitaly Burovoy
Vitaly Burovoy <vitaly.burovoy@gmail.com> writes: > But what I discover for myself is that we have pg_attrdef separately > from the pg_attribute. Why? The core reason for that is that the default expression needs to be a separate object from the column for purposes of dependency analysis. For example, if you have a column whose default is "foo()", then the default expression depends on the function foo(), but the column should not: if you drop the function, only the default expression ought to be dropped, not the column. Because of this, the default expression needs to have its own OID (to be stored in pg_depend) and it's convenient to store it in a separate catalog so that the classoid can identify it as being a default expression rather than some other kind of object. If we were going to allow these missing_values or creation_defaults or whatever they're called to be general expressions, then they would need to have their own OIDs for dependency purposes. That would lead me to think that the best representation is to put them in their own rows in pg_attrdef, perhaps adding a boolean column to pg_attrdef to distinguish regular defaults from these things. Or maybe they even need their own catalog, depending on whether you think dependency analysis would want to distinguish them from regular defaults using just the classoid. Now, as I just pointed out in another mail, realistically we're probably going to restrict the feature to simple constants, which'd mean they will depend only on the column's type and can never need any dependencies of their own. So we could take the shortcut of just storing them in a new column in pg_attribute. But maybe that's shortsighted and we'll eventually wish we'd done them as full-fledged separate objects. But on the third hand ... once one of these is in place, how could you drop it separately from the column? That would amount to a change in the column's stored data, which is not what one would expect from dropping a separate object. So maybe it's senseless to think that these things could ever be distinct objects. But that definitely leads to the conclusion that they're constants and nothing else. regards, tom lane
On 10/6/16, Serge Rielau <serge@rielau.com> wrote: >> On Oct 6, 2016, at 9:01 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> BTW, it also occurs to me that there are going to be good implementation >> reasons for restricting it to be a hard constant, not any sort of >> expression. We are likely to need to be able to insert the value in >> low-level code where general expression evaluation is impractical. >> > Yes, the padding must happen primarily in the getAttr() routines. > Clearly we do not want to evaluate an expression there. > But what speaks against evaluating the expression before we store it? > After all we seem to all agree that this only works if the expression > computes to the same constant all the time. > > If we do not want to store an “untyped” datum straight in pg_attribute as a > BYTEA (my current approach) Ough. I made a mistake about pg_attribute because I forgot about the pg_attrdef. If we do not merge these tables, the pg_attrdef is the best place to store evaluated expression as a constant the same way defaults are stored in adbin. > we could store the pretty printed version of the constant It is a wrong way. It ruins commands like "ALTER COLUMN ... TYPE ... USING" > and evaluate that when we build the tuple descriptor. > This happens when we load the relation into the relcache. > > Anyway, I’m jumping ahead and it’s perhaps best to let the code speak for > itself once I have the WIP patch ready so we have something concrete to > discuss -- Best regards, Vitaly Burovoy
On 10/6/16, Vitaly Burovoy <vitaly.burovoy@gmail.com> wrote: > Ough. I made a mistake about pg_attribute because I forgot about the > pg_attrdef. > If we do not merge these tables, the pg_attrdef is the best place to > store evaluated expression as a constant the same way defaults are > stored in adbin. Oops. While I was writing the previous email, Tom explained necessity of the pg_attrdef. With that explanation it is obvious that I was wrong and a value for pre-existing rows should be in a new column in the pg_attribute. All the other thoughts from my previous email stand good. -- Best regards, Vitaly Burovoy
> On Oct 6, 2016, at 9:20 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Vitaly Burovoy <vitaly.burovoy@gmail.com> writes: >> But what I discover for myself is that we have pg_attrdef separately >> from the pg_attribute. Why? > > The core reason for that is that the default expression needs to be > a separate object from the column for purposes of dependency analysis. > For example, if you have a column whose default is "foo()", then the > default expression depends on the function foo(), but the column should > not: if you drop the function, only the default expression ought to > be dropped, not the column. > > Because of this, the default expression needs to have its own OID > (to be stored in pg_depend) and it's convenient to store it in a > separate catalog so that the classoid can identify it as being a > default expression rather than some other kind of object. Good to know. > > If we were going to allow these missing_values or creation_defaults > or whatever they're called to be general expressions, then they would need > to have their own OIDs for dependency purposes. That would lead me to > think that the best representation is to put them in their own rows in > pg_attrdef, perhaps adding a boolean column to pg_attrdef to distinguish > regular defaults from these things. Or maybe they even need their own > catalog, depending on whether you think dependency analysis would want > to distinguish them from regular defaults using just the classed. > > Now, as I just pointed out in another mail, realistically we're probably > going to restrict the feature to simple constants, which'd mean they will > depend only on the column's type and can never need any dependencies of > their own. So we could take the shortcut of just storing them in a new > column in pg_attribute. But maybe that's shortsighted and we'll > eventually wish we'd done them as full-fledged separate objects. > > But on the third hand ... once one of these is in place, how could you > drop it separately from the column? That would amount to a change in the > column's stored data, which is not what one would expect from dropping > a separate object. So maybe it's senseless to think that these things > could ever be distinct objects. But that definitely leads to the > conclusion that they're constants and nothing else. I cannot follow this reasoning. Let’s look past what PG does today: For each row (whether that’s necessary or not) we evaluate the expression, compute the value and store it in the rewritten table. We do not record dependencies on the “pedigree” of the value. It happened to originate from the DEFAULT expression provided with the ADD COLUMN, but that is not remembered anywhere. All we remember is the value - in each row. So the only change that is proposed here - when it comes right down to it - is to remember the value once only (IFF it is provably the same for each row) and thus avoid the need to rewrite the table. So I see no reason to impose any restriction other than “evaluated value is provably the same for every row”. Regarding the location of storage. I did start of using pg_attrdef, but ran into some snags. My approach was to add the value as an extra column (rather than an extra row). That caused trouble since a SET DEFAULT operation is decomposed into a DROP and a SET and preserving the value across such operations did not come naturally. If we were to use extra rows instead that issue would be solved, assuming we ad a “default kind” sort of column. It would dictate the storage format though which may be considered overkill for a a constant. Cheers Serge
On 10/6/16, Serge Rielau <serge@rielau.com> wrote: >> On Oct 6, 2016, at 9:20 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Vitaly Burovoy <vitaly.burovoy@gmail.com> writes: >>> But what I discover for myself is that we have pg_attrdef separately >>> from the pg_attribute. Why? >> >> The core reason for that is that the default expression needs to be >> a separate object from the column for purposes of dependency analysis. >> For example, if you have a column whose default is "foo()", then the >> default expression depends on the function foo(), but the column should >> not: if you drop the function, only the default expression ought to >> be dropped, not the column. >> >> Because of this, the default expression needs to have its own OID >> (to be stored in pg_depend) and it's convenient to store it in a >> separate catalog so that the classoid can identify it as being a >> default expression rather than some other kind of object. > > Good to know. > >> If we were going to allow these missing_values or creation_defaults >> or whatever they're called to be general expressions, then they would >> need >> to have their own OIDs for dependency purposes. That would lead me to >> think that the best representation is to put them in their own rows in >> pg_attrdef, perhaps adding a boolean column to pg_attrdef to distinguish >> regular defaults from these things. Or maybe they even need their own >> catalog, depending on whether you think dependency analysis would want >> to distinguish them from regular defaults using just the classed. >> >> Now, as I just pointed out in another mail, realistically we're probably >> going to restrict the feature to simple constants, which'd mean they will >> depend only on the column's type and can never need any dependencies of >> their own. So we could take the shortcut of just storing them in a new >> column in pg_attribute. I agree with you. >> But maybe that's shortsighted and we'll >> eventually wish we'd done them as full-fledged separate objects. I don't think so. If we try to implement non-blocking adding columns with volatile defaults (and for instance update old rows in the background), we can end up with the next situation: CREATE TABLE a(i bigint PRIMARY KEY); INSERT INTO a SELECT generate_series(1,10000000000); ALTER TABLE a ADD COLUMN b bigserial CHECK (b BETWEEN 1 AND 100); For indexes (even unique) created concurrently similar troubles are solved with a "not valid" mark, but what to do with a heap if we try to do it in the background? >> But on the third hand ... once one of these is in place, how could you >> drop it separately from the column? That would amount to a change in the >> column's stored data, which is not what one would expect from dropping >> a separate object. So maybe it's senseless to think that these things >> could ever be distinct objects. But that definitely leads to the >> conclusion that they're constants and nothing else. > I cannot follow this reasoning. > Let’s look past what PG does today: > For each row (whether that’s necessary or not) we evaluate the expression, > compute the value and > store it in the rewritten table. > We do not record dependencies on the “pedigree” of the value. > It happened to originate from the DEFAULT expression provided with the ADD > COLUMN, > but that is not remembered anywhere. > All we remember is the value - in each row. > So the only change that is proposed here - when it comes right down to it - > is to remember the value once only (IFF it is provably the same for each row) > and thus avoid the need to rewrite the table. > So I see no reason to impose any restriction other than “evaluated value is > provably the same for every row”. Tom says the same thing. The expression at the end should be a value if it allows to avoid rewriting table. > Regarding the location of storage. > I did start of using pg_attrdef, but ran into some snags. > My approach was to add the value as an extra column (rather than an extra > row). > That caused trouble since a SET DEFAULT operation is decomposed into a DROP > and a SET and > preserving the value across such operations did not come naturally. I'm sorry for making you be confused. The best way is to use an extra column in the pg_attribute to store serialized value. > If we were to use extra rows instead that issue would be solved, assuming we > add a “default kind” sort of column. > It would dictate the storage format though which may be considered overkill > for a a constant. -- Best regards, Vitaly Burovoy
On 10/6/16 11:01 AM, Tom Lane wrote: > Something based on missing_value/absent_value could work for me too. > If we name it something involving "default", that definitely increases > the possibility for confusion with the regular user-settable default. > > Also worth thinking about here is that the regular default expression > affects what will be put into future inserted rows, whereas this thing > affects the interpretation of past rows. So it's really quite a different > animal. That's kind of leading me away from calling it creation_default. There's actually another use case here that's potentially extremely valuable for warehousing and other "big data": compact representation of a default value. The idea here is that if you have a specific value for a field that makes up a very large portion of your data, you'd really like to be able to represent that value *in each row* with something like a bit (such as we currently do for NULLs). What I'd expect to see in the real world (once users figure this hack out) would be: CREATE TABLE ...( ... -- skip field_a so we can handle all it's common values ); INSERT INTO ... SELECT ... WHERE field_a IS NOT DISTINCT FROM 'really common value' ; ALTER TABLE ADD field_a ... NOT NULL DEFAULT 'really common value' ; -- load rest of the data That would have the effect of storing all those really common values with a single bit. What we'd ultimately want is some kind of catalog versioning so that we knew what was in place when each tuple was created; that would allow for changing these things over time without forcing a full rewrite. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461
There's actually another use case here that's potentially extremely valuable for warehousing and other "big data": compact representation of a default value.
On 10/9/16 11:02 PM, Corey Huinker wrote: > > There's actually another use case here that's potentially extremely > valuable for warehousing and other "big data": compact > representation of a default value. > > > I too would benefit from tables having either a default value in the > event of a NOT-NULL column being flagged null, or a flat-out constant. > > This would be a big win in partitioned tables where the partition can > only hold one value of the partitioning column. I hadn't thought of that use case... with rowcounts in the billions becoming pretty common even the cost of a 4 byte enum starts to add up. > I guess a constant would be a pg_type where the sole value is encoded, > and the column itself is stored like an empty string. Not empty string; the storage would look like NULL does today; the difference being that we'd know that attribute wasn't NULL-able so if it's marked as being "NULL" it actually means it has the default value. Though obviously this would only work if the default was a Const, and you wouldn't be able to change the default without ensuring no rows in the table were using this trick. But I suspect there's still plenty of scenarios where the advantage is worth it. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461

http://www.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.dbobj.doc/doc/c0056482.html
On 10/9/16 11:02 PM, Corey Huinker wrote:
>
> There's actually another use case here that's potentially extremely
> valuable for warehousing and other "big data": compact
> representation of a default value.
>
>
> I too would benefit from tables having either a default value in the
> event of a NOT-NULL column being flagged null, or a flat-out constant.
>
> This would be a big win in partitioned tables where the partition can
> only hold one value of the partitioning column.
I hadn't thought of that use case... with rowcounts in the billions
becoming pretty common even the cost of a 4 byte enum starts to add up.
> I guess a constant would be a pg_type where the sole value is encoded,
> and the column itself is stored like an empty string.
Not empty string; the storage would look like NULL does today; the
difference being that we'd know that attribute wasn't NULL-able so if
it's marked as being "NULL" it actually means it has the default value.
Though obviously this would only work if the default was a Const, and
you wouldn't be able to change the default without ensuring no rows in
the table were using this trick. But I suspect there's still plenty of
scenarios where the advantage is worth it.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532) mobile: 512-569-9461