Home > mailing lists

Re: ALTER TABLE ADD COLUMN fast default - Mailing list pgsql-hackers

From	Andrew Dunstan
Subject	Re: ALTER TABLE ADD COLUMN fast default
Date	March 1, 2018 01:49:43
Msg-id	CAA8=A782mqgxq1u6fbcUYH+bv+Yjf=JjoH57+8-VdjbZLfRzwg@mail.gmail.com Whole thread Raw
In response to	Re: ALTER TABLE ADD COLUMN fast default (Andres Freund <andres@anarazel.de>)
Responses	Re: ALTER TABLE ADD COLUMN fast default
List	pgsql-hackers

Tree view

On Wed, Feb 28, 2018 at 5:39 AM, Andres Freund <andres@anarazel.de> wrote:
> Hi,
>
> On 2018-02-27 14:29:44 +1030, Andrew Dunstan wrote:
>> This was debated quite some time ago. See for example
>> <https://www.postgresql.org/message-id/22999.1475770835%40sss.pgh.pa.us>
>> The consensus then seemed to be to store them in pg_attribute. If not,
>> I think we'd need a new catalog - pg_attrdef doesn't seem right. They
>> would have to go on separate rows, and we'd be storing values rather
>> than expressions, so it would get somewhat ugly.
>
> I know that I'm concerned with storing a number of large values in
> pg_attributes, and I haven't seen that argument addressed much in a
> quick scan of the discussion.
>
>


I'd like to have a consensus on this before I start making such a
change as adding a new catalog table.


>> >> @@ -293,10 +408,18 @@ heap_fill_tuple(TupleDesc tupleDesc,
>> >>   * ----------------
>> >>   */
>> >>  bool
>> >> -heap_attisnull(HeapTuple tup, int attnum)
>> >> +heap_attisnull(HeapTuple tup, int attnum, TupleDesc tupleDesc)
>> >>  {
>> >> +     /*
>> >> +      * We allow a NULL tupledesc for relations not expected to have
>> >> +      * missing values, such as catalog relations and indexes.
>> >> +      */
>> >> +     Assert(!tupleDesc || attnum <= tupleDesc->natts);
>> >
>> > This seems fairly likely to hide bugs.
>
>> How? There are plenty of cases where we simply expect quite reasonably
>> that there won't be any missing attributes, and we don't need  a
>> tupleDesc at all in such cases.
>
> Missing to pass a tupledesc for tables that can have defaults with not
> have directly visible issues, you'll just erroneously get back a null.
>
>


What would you prefer? Two versions of the function?


>> >> @@ -508,6 +508,8 @@ tlist_matches_tupdesc(PlanState *ps, List *tlist, Index varno, TupleDesc tupdesc
>> >>                       return false;           /* out of order */
>> >>               if (att_tup->attisdropped)
>> >>                       return false;           /* table contains dropped columns */
>> >> +             if (att_tup->atthasmissing)
>> >> +                     return false;           /* table contains cols with missing values */
>> >
>> > Hm, so incrementally building a table definitions with defaults, even if
>> > there aren't ever any rows, or if there's a subsequent table rewrite,
>> > will cause slowdowns. If so, shouldn't a rewrite remove all
>> > atthasmissing values?
>
>> Yes, I think so. Working on it. OTOH, I'm somewhat mystified by having
>> had to make this change.  Still, it looks like dropping a column has
>> the same effect.
>
> What exactly is mystifying you? That the new default stuff doesn't work
> when the physical tlist optimization is in use?
>


I think I have a better handle on that now. I'm trying to think of a
way around it, no massive inspiration yet.


>
>> >> @@ -561,10 +568,73 @@ RelationBuildTupleDesc(Relation relation)
>> >>                                       MemoryContextAllocZero(CacheMemoryContext,
>> >>                                                                                  relation->rd_rel->relnatts *
>> >>                                                                                  sizeof(AttrDefault));
>> >> -                     attrdef[ndef].adnum = attp->attnum;
>> >> +                     attrdef[ndef].adnum = attnum;
>> >>                       attrdef[ndef].adbin = NULL;
>> >> +
>> >>                       ndef++;
>> >>               }
>> >> +
>> >> +             /* Likewise for a missing value */
>> >> +             if (attp->atthasmissing)
>> >> +             {
>> >
>> > As I've written somewhere above, I don't think it's a good idea to do
>> > this preemptively. Tupledescs are very commonly copied, and the
>> > missing attributes are quite likely never used. IOW, we should just
>> > remember the attmissingattr value and fill in the corresponding
>> > attmissingval on-demand.
>
>> Defaults are frequently not used either, yet this function fills those
>> in regardless. I don't see why we need to treat missing values
>> differently.
>
> It's already hurting us quite a bit in workloads with a lot of trivial
> queries. Adding more makes it worse.
>
>


Please expand on your view of how we should "just remember the
attmissingattr value and fill in the corresponding attmissingval
on-demand."

I don't mind changing how we do things, but I need a bit more detail than this.

If copying defaults around has been hurting us so much it seems
surprising nobody has put forward an improvement.


>> Attached is the current state of things, with the fixes indicated
>> above, as well as some things pointed out by David Rowley.
>>
>> None of this seems to have had much effect on performance.
>>
>> Here's what oprofile reports from a run of pgbench with
>> create-alter.sql and the query "select sum(c1000) from t":
>>
>> Profiling through timer interrupt
>> samples  %        image name               symbol name
>> 22584    28.5982  postgres                 ExecInterpExpr
>> 11950    15.1323  postgres                 slot_getmissingattrs
>> 5471      6.9279  postgres                 AllocSetAlloc
>> 3018      3.8217  postgres                 TupleDescInitEntry
>> 2042      2.5858  postgres                 MemoryContextAllocZeroAligned
>> 1807      2.2882  postgres                 SearchCatCache1
>> 1638      2.0742  postgres                 expression_tree_walker
>>
>>
>> That's very different from what we see on the master branch. The time
>> spent in slot_getmissingattrs is perhaps not unexpected, but I don't
>> (at least yet) understand why we're getting so much time spent in
>> ExecInterpExpr, which doesn't show up at all when the same benchmark
>> is run on master.
>
> I'd guess that's because the physical tlist optimization is disabled. I
> assume you'd see something similar on master if you dropped a column.
>


Yeah. That's kinda ugly in fact.

Here's a version of the patch which cleans up the missing attribute
settings on a table rewrite.

cheers

andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

fast_default-v13.patch

pgsql-hackers by date:

From: Peter Eisentraut
Date: 01 March 2018, 01:37:11
Subject: Re: prokind column (was Re: [HACKERS] SQL procedures)

From: Tom Lane
Date: 01 March 2018, 01:54:43
Subject: VPATH build from a tarball fails with some gmake versions

Re: ALTER TABLE ADD COLUMN fast default - Mailing list pgsql-hackers

Attachment

Previous

Next