Re: ALTER TABLE ADD COLUMN fast default - Mailing list pgsql-hackers
From | Andrew Dunstan |
---|---|
Subject | Re: ALTER TABLE ADD COLUMN fast default |
Date | |
Msg-id | CAA8=A782mqgxq1u6fbcUYH+bv+Yjf=JjoH57+8-VdjbZLfRzwg@mail.gmail.com Whole thread Raw |
In response to | Re: ALTER TABLE ADD COLUMN fast default (Andres Freund <andres@anarazel.de>) |
Responses |
Re: ALTER TABLE ADD COLUMN fast default
|
List | pgsql-hackers |
On Wed, Feb 28, 2018 at 5:39 AM, Andres Freund <andres@anarazel.de> wrote: > Hi, > > On 2018-02-27 14:29:44 +1030, Andrew Dunstan wrote: >> This was debated quite some time ago. See for example >> <https://www.postgresql.org/message-id/22999.1475770835%40sss.pgh.pa.us> >> The consensus then seemed to be to store them in pg_attribute. If not, >> I think we'd need a new catalog - pg_attrdef doesn't seem right. They >> would have to go on separate rows, and we'd be storing values rather >> than expressions, so it would get somewhat ugly. > > I know that I'm concerned with storing a number of large values in > pg_attributes, and I haven't seen that argument addressed much in a > quick scan of the discussion. > > I'd like to have a consensus on this before I start making such a change as adding a new catalog table. >> >> @@ -293,10 +408,18 @@ heap_fill_tuple(TupleDesc tupleDesc, >> >> * ---------------- >> >> */ >> >> bool >> >> -heap_attisnull(HeapTuple tup, int attnum) >> >> +heap_attisnull(HeapTuple tup, int attnum, TupleDesc tupleDesc) >> >> { >> >> + /* >> >> + * We allow a NULL tupledesc for relations not expected to have >> >> + * missing values, such as catalog relations and indexes. >> >> + */ >> >> + Assert(!tupleDesc || attnum <= tupleDesc->natts); >> > >> > This seems fairly likely to hide bugs. > >> How? There are plenty of cases where we simply expect quite reasonably >> that there won't be any missing attributes, and we don't need a >> tupleDesc at all in such cases. > > Missing to pass a tupledesc for tables that can have defaults with not > have directly visible issues, you'll just erroneously get back a null. > > What would you prefer? Two versions of the function? >> >> @@ -508,6 +508,8 @@ tlist_matches_tupdesc(PlanState *ps, List *tlist, Index varno, TupleDesc tupdesc >> >> return false; /* out of order */ >> >> if (att_tup->attisdropped) >> >> return false; /* table contains dropped columns */ >> >> + if (att_tup->atthasmissing) >> >> + return false; /* table contains cols with missing values */ >> > >> > Hm, so incrementally building a table definitions with defaults, even if >> > there aren't ever any rows, or if there's a subsequent table rewrite, >> > will cause slowdowns. If so, shouldn't a rewrite remove all >> > atthasmissing values? > >> Yes, I think so. Working on it. OTOH, I'm somewhat mystified by having >> had to make this change. Still, it looks like dropping a column has >> the same effect. > > What exactly is mystifying you? That the new default stuff doesn't work > when the physical tlist optimization is in use? > I think I have a better handle on that now. I'm trying to think of a way around it, no massive inspiration yet. > >> >> @@ -561,10 +568,73 @@ RelationBuildTupleDesc(Relation relation) >> >> MemoryContextAllocZero(CacheMemoryContext, >> >> relation->rd_rel->relnatts * >> >> sizeof(AttrDefault)); >> >> - attrdef[ndef].adnum = attp->attnum; >> >> + attrdef[ndef].adnum = attnum; >> >> attrdef[ndef].adbin = NULL; >> >> + >> >> ndef++; >> >> } >> >> + >> >> + /* Likewise for a missing value */ >> >> + if (attp->atthasmissing) >> >> + { >> > >> > As I've written somewhere above, I don't think it's a good idea to do >> > this preemptively. Tupledescs are very commonly copied, and the >> > missing attributes are quite likely never used. IOW, we should just >> > remember the attmissingattr value and fill in the corresponding >> > attmissingval on-demand. > >> Defaults are frequently not used either, yet this function fills those >> in regardless. I don't see why we need to treat missing values >> differently. > > It's already hurting us quite a bit in workloads with a lot of trivial > queries. Adding more makes it worse. > > Please expand on your view of how we should "just remember the attmissingattr value and fill in the corresponding attmissingval on-demand." I don't mind changing how we do things, but I need a bit more detail than this. If copying defaults around has been hurting us so much it seems surprising nobody has put forward an improvement. >> Attached is the current state of things, with the fixes indicated >> above, as well as some things pointed out by David Rowley. >> >> None of this seems to have had much effect on performance. >> >> Here's what oprofile reports from a run of pgbench with >> create-alter.sql and the query "select sum(c1000) from t": >> >> Profiling through timer interrupt >> samples % image name symbol name >> 22584 28.5982 postgres ExecInterpExpr >> 11950 15.1323 postgres slot_getmissingattrs >> 5471 6.9279 postgres AllocSetAlloc >> 3018 3.8217 postgres TupleDescInitEntry >> 2042 2.5858 postgres MemoryContextAllocZeroAligned >> 1807 2.2882 postgres SearchCatCache1 >> 1638 2.0742 postgres expression_tree_walker >> >> >> That's very different from what we see on the master branch. The time >> spent in slot_getmissingattrs is perhaps not unexpected, but I don't >> (at least yet) understand why we're getting so much time spent in >> ExecInterpExpr, which doesn't show up at all when the same benchmark >> is run on master. > > I'd guess that's because the physical tlist optimization is disabled. I > assume you'd see something similar on master if you dropped a column. > Yeah. That's kinda ugly in fact. Here's a version of the patch which cleans up the missing attribute settings on a table rewrite. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
pgsql-hackers by date: