Re: WIP: Covering + unique indexes. - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: WIP: Covering + unique indexes. |
Date | |
Msg-id | 20170330191138.zawsxgsxnt2qy2as@alap3.anarazel.de Whole thread Raw |
In response to | WIP: Covering + unique indexes. (Anastasia Lubennikova <a.lubennikova@postgrespro.ru>) |
Responses |
Re: WIP: Covering + unique indexes.
|
List | pgsql-hackers |
On 2017-03-30 18:26:05 +0300, Teodor Sigaev wrote: > Any objection from reviewers to push both patches? > diff --git a/contrib/bloom/blutils.c b/contrib/bloom/blutils.c > index f2eda67..59029b9 100644 > --- a/contrib/bloom/blutils.c > +++ b/contrib/bloom/blutils.c > @@ -120,6 +120,7 @@ blhandler(PG_FUNCTION_ARGS) > amroutine->amclusterable = false; > amroutine->ampredlocks = false; > amroutine->amcanparallel = false; > + amroutine->amcaninclude = false; That name doesn't strike me as very descriptive. > + <term><literal>INCLUDE</literal></term> > + <listitem> > + <para> > + An optional <literal>INCLUDE</> clause allows a list of columns to be > + specified which will be included in the non-key portion of the index. > + Columns which are part of this clause cannot also exist in the > + key columns portion of the index, and vice versa. The > + <literal>INCLUDE</> columns exist solely to allow more queries to benefit > + from <firstterm>index-only scans</> by including certain columns in the > + index, the value of which would otherwise have to be obtained by reading > + the table's heap. Having these columns in the <literal>INCLUDE</> clause > + in some cases allows <productname>PostgreSQL</> to skip the heap read > + completely. This also allows <literal>UNIQUE</> indexes to be defined on > + one set of columns, which can include another set of columns in the > + <literal>INCLUDE</> clause, on which the uniqueness is not enforced. > + It's the same with other constraints (PRIMARY KEY and EXCLUDE). This can > + also can be used for non-unique indexes as any columns which are not required > + for the searching or ordering of records can be used in the > + <literal>INCLUDE</> clause, which can slightly reduce the size of the index. > + Currently, only the B-tree access method supports this feature. > + Expressions as included columns are not supported since they cannot be used > + in index-only scans. > + </para> > + </listitem> > + </varlistentry> This could use some polishing. > +/* > + * Reform index tuple. Truncate nonkey (INCLUDE) attributes. > + */ > +IndexTuple > +index_truncate_tuple(Relation idxrel, IndexTuple olditup) > +{ > + TupleDesc itupdesc = RelationGetDescr(idxrel); > + Datum values[INDEX_MAX_KEYS]; > + bool isnull[INDEX_MAX_KEYS]; > + IndexTuple newitup; > + int indnatts = IndexRelationGetNumberOfAttributes(idxrel); > + int indnkeyatts = IndexRelationGetNumberOfKeyAttributes(idxrel); > + > + Assert(indnatts <= INDEX_MAX_KEYS); > + Assert(indnkeyatts > 0); > + Assert(indnkeyatts < indnatts); > + > + index_deform_tuple(olditup, itupdesc, values, isnull); > + > + /* form new tuple that will contain only key attributes */ > + itupdesc->natts = indnkeyatts; > + newitup = index_form_tuple(itupdesc, values, isnull); > + newitup->t_tid = olditup->t_tid; > + > + itupdesc->natts = indnatts; Uh, isn't this a *seriously* bad idea? If index_form_tuple errors out, this'll corrupt the tuple descriptor. Maybe also rename the function to index_build_key_tuple()? > * Construct a string describing the contents of an index entry, in the > * form "(key_name, ...)=(key_value, ...)". This is currently used > - * for building unique-constraint and exclusion-constraint error messages. > + * for building unique-constraint and exclusion-constraint error messages, > + * so only key columns of index are checked and printed. s/index/the index/ > @@ -368,7 +370,7 @@ systable_beginscan(Relation heapRelation, > { > int j; > > - for (j = 0; j < irel->rd_index->indnatts; j++) > + for (j = 0; j < IndexRelationGetNumberOfAttributes(irel); j++) > { > if (key[i].sk_attno == irel->rd_index->indkey.values[j]) > { > @@ -376,7 +378,7 @@ systable_beginscan(Relation heapRelation, > break; > } > } > - if (j == irel->rd_index->indnatts) > + if (j == IndexRelationGetNumberOfAttributes(irel)) > elog(ERROR, "column is not in index"); > } Not that it matters overly much, but why are we doing this for all attributes, rather than just key attributes? > --- a/src/backend/bootstrap/bootstrap.c > +++ b/src/backend/bootstrap/bootstrap.c > @@ -600,7 +600,7 @@ boot_openrel(char *relname) > relname, (int) ATTRIBUTE_FIXED_PART_SIZE); > > boot_reldesc = heap_openrv(makeRangeVar(NULL, relname, -1), NoLock); > - numattr = boot_reldesc->rd_rel->relnatts; > + numattr = RelationGetNumberOfAttributes(boot_reldesc); > for (i = 0; i < numattr; i++) > { > if (attrtypes[i] == NULL) That seems a bit unrelated. > @@ -2086,7 +2086,8 @@ StoreRelCheck(Relation rel, char *ccname, Node *expr, > is_validated, > RelationGetRelid(rel), /* relation */ > attNos, /* attrs in the constraint */ > - keycount, /* # attrs in the constraint */ > + keycount, /* # key attrs in the constraint */ > + keycount, /* # total attrs in the constraint */ > InvalidOid, /* not a domain constraint */ > InvalidOid, /* no associated index */ > InvalidOid, /* Foreign key fields */ It doesn't quite seem right to me to store this both in pg_index and pg_constraint. > @@ -340,14 +341,27 @@ DefineIndex(Oid relationId, > numberOfAttributes = list_length(stmt->indexParams); > - if (numberOfAttributes <= 0) > - ereport(ERROR, > - (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), > - errmsg("must specify at least one column"))); > + Huh, why's that check gone? > > +opt_c_include: INCLUDE '(' columnList ')' { $$ = $3; } > + | /* EMPTY */ { $$ = NIL; } > + ; > +opt_include: INCLUDE '(' index_including_params ')' { $$ = $3; } > + | /* EMPTY */ { $$ = NIL; } > + ; > + > +index_including_params: index_elem { $$ = list_make1($1); } > + | index_including_params ',' index_elem { $$ = lappend($1, $3); } > + ; > + Why do we have multiple different definitions of this? > @@ -1979,6 +2017,48 @@ transformIndexConstraint(Constraint *constraint, CreateStmtContext *cxt) > index->indexParams = lappend(index->indexParams, iparam); > } > > + /* Here is some code duplication. But we do need it. */ Aha? > + foreach(lc, constraint->including) > + { > + char *key = strVal(lfirst(lc)); > + bool found = false; > + ColumnDef *column = NULL; > + ListCell *columns; > + IndexElem *iparam; > + > + foreach(columns, cxt->columns) > + { > + column = (ColumnDef *) lfirst(columns); > + Assert(IsA(column, ColumnDef)); > + if (strcmp(column->colname, key) == 0) > + { > + found = true; > + break; > + } > + } > + > + /* > + * In the ALTER TABLE case, don't complain about index keys not > + * created in the command; they may well exist already. DefineIndex > + * will complain about them if not, and will also take care of marking > + * them NOT NULL. > + */ Uh. Why should they be marked as NOT NULL? ISTM the comment has been copied here without adjustments. > @@ -1275,6 +1275,21 @@ pg_get_indexdef_worker(Oid indexrelid, int colno, > Oid keycoltype; > Oid keycolcollation; > > + /* > + * attrsOnly flag is used for building unique-constraint and > + * exclusion-constraint error messages. Included attrs are > + * meaningless there, so do not include them in the message. > + */ > + if (attrsOnly && keyno >= idxrec->indnkeyatts) > + break; Sounds like the parameter should be renamed then. > +Included attributes in B-tree indexes > +------------------------------------- > + > +Since 10.0 there is an optional INCLUDE clause, that allows to add 10.0 isn't right, since that's the "patch" version now. > +a portion of non-key attributes to index. They exist to allow more queries > +to benefit from index-only scans. We never use included attributes in > +ScanKeys, neither for search nor for inserts. That allows us to include > +into B-tree any datatypes, even those which don't have suitable opclass. > +Included columns only stored in regular items on leaf pages. All inner > +keys and high keys are truncated and contain only key attributes. > +That helps to reduce the size of index. s/index/the index/ > @@ -537,6 +542,28 @@ _bt_buildadd(BTWriteState *wstate, BTPageState *state, IndexTuple itup) > ItemIdSetUnused(ii); /* redundant */ > ((PageHeader) opage)->pd_lower -= sizeof(ItemIdData); > > + if (indnkeyatts != indnatts && P_ISLEAF(opageop)) > + { > + /* > + * It's essential to truncate High key here. > + * The purpose is not just to save more space on this particular page, > + * but to keep whole b-tree structure consistent. Subsequent insertions > + * assume that hikey is already truncated, and so they should not > + * worry about it, when copying the high key into the parent page > + * as a downlink. s/should/need/ > + * NOTE It is not crutial for reliability in present, s/crutial/crucial/ > + * but maybe it will be that in the future. > + */ "it's essential" ... "it is not crutial" -- that's contradictory. > + keytup = index_truncate_tuple(wstate->index, oitup); The code in _bt_split previously claimed that it's the only place doing truncation... > + /* delete "wrong" high key, insert keytup as P_HIKEY. */ > + PageIndexTupleDelete(opage, P_HIKEY); > + if (!_bt_pgaddtup(opage, IndexTupleSize(keytup), keytup, P_HIKEY)) > + elog(ERROR, "failed to rewrite compressed item in index \"%s\"", > + RelationGetRelationName(wstate->index)); Hm... - Andres
pgsql-hackers by date: