Re: ALTER TABLE ADD COLUMN fast default - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Re: ALTER TABLE ADD COLUMN fast default |
Date | |
Msg-id | CAKJS1f_pbk+nQu=1AihzQMLw_6RA0zB4WidvRMR1Pavnbhu-Hg@mail.gmail.com Whole thread Raw |
In response to | Re: ALTER TABLE ADD COLUMN fast default (David Rowley <david.rowley@2ndquadrant.com>) |
Responses |
Re: ALTER TABLE ADD COLUMN fast default
|
List | pgsql-hackers |
On 9 March 2018 at 02:11, David Rowley <david.rowley@2ndquadrant.com> wrote: > On 8 March 2018 at 18:40, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote: >> select * from t; >> fastdef tps = 107.145811 >> master tps = 150.207957 >> >> "select * from t" used to be about a wash, but with this patch it's >> got worse. The last two queries were worse and are now better, so >> that's a win. > > How does it compare to master if you drop a column out the table? > Physical tlists will be disabled in that case too. I imagine the > performance of master will drop much lower than the all columns > missing case. I decided to test this for myself, and the missing version is still slightly slower than the dropped column version, but not by much. I'm not personally concerned about this. The following results are with 1000 column tables with 64 rows each. *** Benchmarking normal table... tps = 232.794734 (excluding connections establishing) *** Benchmarking missing table... tps = 202.827754 (excluding connections establishing) *** Benchmarking dropped table... tps = 217.339255 (excluding connections establishing) There's nothing particularly interesting in the profiles for the missing and normal case either: -- Missing case for select * from missing; 12.87% postgres postgres [.] AllocSetAlloc 10.09% postgres libc-2.17.so [.] __strlen_sse2_pminub 6.35% postgres postgres [.] pq_sendcountedtext 5.97% postgres postgres [.] enlargeStringInfo 5.47% postgres postgres [.] ExecInterpExpr 4.45% postgres libc-2.17.so [.] __memcpy_ssse3_back 4.39% postgres postgres [.] palloc 4.31% postgres postgres [.] printtup 3.81% postgres postgres [.] pg_ltoa 3.72% postgres [kernel.kallsyms] [k] __lock_text_start 3.38% postgres postgres [.] FunctionCall1Coll 3.11% postgres postgres [.] int4out 2.97% postgres postgres [.] appendBinaryStringInfoNT 2.49% postgres postgres [.] slot_getmissingattrs 1.66% postgres postgres [.] pg_server_to_any 0.99% postgres postgres [.] MemoryContextAllocZeroAligned 0.94% postgres postgres [.] expression_tree_walker 0.93% postgres postgres [.] lappend 0.89% postgres [kernel.kallsyms] [k] __do_page_fault 0.72% postgres [kernel.kallsyms] [k] clear_page_c_e 0.72% postgres postgres [.] SendRowDescriptionMessage 0.71% postgres postgres [.] expandRelAttrs 0.64% postgres [kernel.kallsyms] [k] get_page_from_freelist 0.64% postgres [kernel.kallsyms] [k] copy_user_enhanced_fast_string 0.62% postgres postgres [.] pg_server_to_client 0.59% postgres libc-2.17.so [.] __memset_sse2 0.58% postgres postgres [.] set_pathtarget_cost_width 0.56% postgres postgres [.] OutputFunctionCall 0.56% postgres postgres [.] memcpy@plt 0.54% postgres postgres [.] makeTargetEntry 0.50% postgres postgres [.] SearchCatCache3 -- Normal case for select * from normal; 12.57% postgres postgres [.] AllocSetAlloc 10.50% postgres libc-2.17.so [.] __strlen_sse2_pminub 7.65% postgres postgres [.] slot_deform_tuple 6.52% postgres postgres [.] pq_sendcountedtext 6.03% postgres postgres [.] enlargeStringInfo 4.51% postgres postgres [.] printtup 4.35% postgres postgres [.] palloc 4.34% postgres libc-2.17.so [.] __memcpy_ssse3_back 3.90% postgres postgres [.] pg_ltoa 3.49% postgres [kernel.kallsyms] [k] __lock_text_start 3.35% postgres postgres [.] int4out 3.31% postgres postgres [.] FunctionCall1Coll 2.82% postgres postgres [.] appendBinaryStringInfoNT 1.68% postgres postgres [.] pg_server_to_any 1.30% postgres postgres [.] SearchCatCache3 1.01% postgres postgres [.] SendRowDescriptionMessage 0.89% postgres postgres [.] lappend 0.88% postgres postgres [.] MemoryContextAllocZeroAligned 0.79% postgres postgres [.] expression_tree_walker 0.67% postgres postgres [.] SearchCatCache1 0.67% postgres postgres [.] expandRelAttrs 0.64% postgres [kernel.kallsyms] [k] copy_user_enhanced_fast_string 0.60% postgres postgres [.] pg_server_to_client 0.57% postgres [kernel.kallsyms] [k] __do_page_fault 0.56% postgres postgres [.] OutputFunctionCall 0.53% postgres postgres [.] memcpy@plt 0.53% postgres postgres [.] makeTargetEntry 0.51% postgres [kernel.kallsyms] [k] get_page_from_freelist The difference appears to be in ExecInterpExpr, although, that only accounts for about 5.5%, and we're missing 15% in performance. Going by the commitfest app, the patch still does appear to be waiting on Author. Never-the-less, I've made another pass over it and found a few mistakes and a couple of ways to improve things: 1. Should <structname> not be <structfield>? <entry> This column has a value which is used where the column is entirely missing from the row, as happens when a column is added after the row is created. The actual value used is stored in the <structname>attmissingval</structname> column. </entry> Also additional surplus space before "column". The same mixup appears in: <structname>atthasmissing</structname> is true. If there is no value 2. Instead of slot_getmissingattrs() having a memset() to set all the missing attributes to NULL before resetting them to the missing value, if one is found. Would it not be better to just have the missing array store both the NULLs and the DEFAULTs together? With that, the code could become: static void slot_getmissingattrs(TupleTableSlot *slot, int startAttNum, int lastAttNum) { AttrMissing *attrmiss; int attnum; if (slot->tts_tupleDescriptor->constr && slot->tts_tupleDescriptor->constr->missing) attrmiss = slot->tts_tupleDescriptor->constr->missing; else return; for (attnum = startAttNum; attnum < lastAttNum; attnum++) { slot->tts_values[attnum] = attrmiss[attnum].ammissing; slot->tts_isnull[attnum] = false; } } You may even be able to just: Assert(slot->tts_tupleDescriptor->constr); Assert(slot->tts_tupleDescriptor->constr->missing); instead of the run-time check. This would also simplify expand_tuple() and getmissingattr() I think this change makes sense as slot_getmissingattrs() is getting NULLs and the missing values, not just the missing values as the name indicates. The name would become more fitting if you class a NULL as a missing value. 3. Not sure what amnum means. Is it meant to be adnum? * the source (i.e. where its amnum is > sourceNatts). We can ignore 4. In StoreAttrDefault(), is there any point in creating the executor state when add_column_mode == false? It's probably also worth commenting what add_column_mode does. If any caller mistakenly passes it as 'true' then you'll overwrite the original missing value which would appear to update tuples that already exist in the table. It also looks like you could move the bulk of the new code in that function, including the new variables into where you're modifying the pg_attribute tuple. This function also incorrectly sets atthasmissing to true even when the DEFAULT expression evaluates to NULL. This seems to contradict what you're doing in RelationBuildTupleDesc(), which skips building the missing array unless it sees a non-null attmissingval. This will cause getmissingattr() to Assert fail with the following case: drop table if exists t1; create table t1 (a int); insert into t1 values(1); alter table t1 add column b text default 'test' || null; alter table t1 add column c text null; select attname,atthasmissing,attmissingval from pg_attribute where attrelid = 't1'::regclass and attnum>0; select b from t1; TRAP: FailedAssertion("!(tupleDesc->constr->missing)", File: "src/backend/access/common/heaptuple.c", Line: 98) Probably another good reason to do the refactor work mentioned in #2. It might be worth adding a regression test to ensure atthasmissing remains false for a column where the default expression evaluated to NULL; 5. It's not really required, but in AddRelationNewConstraints() you could get away with only performing the volatile function check if missingMode is true. That would save calling contain_volatile_functions() needlessly. 6. The comment above RelationClearMissing() should probably mention that the caller must hold an AccessExclusiveLock on rel. There is also a surplus newline between the header comment and the function. 7. Can you pg_ident the patch? There's quite a few places where the whitespace is messed up, and also quite a few places where you've changed lines so they're longer than 80 chars. 8. In RelationBuildTupleDesc() it looks like attnum can be declared in the scope of the while loop. An updated patch should probably also included the patch I sent to disable the physical tlists when there are missing values mentioned in the TupleDesc. Although I admit to not having properly updated the comments even in my v2 attempt. * NIL. Ideally we would like to handle the dropped-column case too. However should read: * NIL. Ideally, we would like to handle these cases too. However Thanks again for working on this feature. I hope we can get this into PG11. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: