Re: SQL:2011 application time - Mailing list pgsql-hackers
From | Paul Jungwirth |
---|---|
Subject | Re: SQL:2011 application time |
Date | |
Msg-id | 0af6a323-f026-4cd9-adf1-941038cd00de@illuminatedcomputing.com Whole thread Raw |
In response to | Re: SQL:2011 application time (jian he <jian.universality@gmail.com>) |
Responses |
Re: SQL:2011 application time
Re: SQL:2011 application time |
List | pgsql-hackers |
Hello, Here are new patches consolidating feedback from several emails. I haven't addressed everything but I think I'm overdue for a reply: On 1/4/24 21:06, jian he wrote: > > I am confused. > say condition: " (cmp_l1l2 <= 0 && cmp_u1l2 >= 0 && cmp_u1u2 <= 0)" > the following code will only run PartA, never run PartB? > > ` > else if (cmp_l1l2 >= 0 && cmp_u1u2 <= 0) > PartA > else if (cmp_l1l2 <= 0 && cmp_u1l2 >= 0 && cmp_u1u2 <= 0) > PartB > ` > > minimum example: > #include<stdio.h> > #include<string.h> > #include<stdlib.h> > #include<assert.h> > int > main(void) > { > int cmp_l1l2; > int cmp_u1u2; > int cmp_u1l2; > int cmp_l1u2; > cmp_l1u2 = -1; > cmp_l1l2 = 0; > cmp_u1u2 = 0; > cmp_u1l2 = 0; > assert(cmp_u1l2 == 0); > if (cmp_l1u2 > 0 || cmp_u1l2 < 0) > printf("calling partA\n"); > else if (cmp_l1l2 >= 0 && cmp_u1u2 <= 0) > printf("calling partB\n"); > else if (cmp_l1l2 <= 0 && cmp_u1l2 >= 0 && cmp_u1u2 <= 0) > printf("calling partC\n"); > } All of the branches are used. I've attached a `without_portion.c` minimal example showing different cases. For ranges it helps to go through the Allen relationships (https://en.wikipedia.org/wiki/Allen%27s_interval_algebra) to make a comprehensive check. (But note that our operators don't exactly match that terminology, and it's important to consider closed-vs-open and unbounded cases.) > I am confused with the name "range_without_portion", I think > "range_not_overlap" would be better. I think I covered this in my other reply and we are now in agreement, but if that's mistaken let know me. > select numrange(1.1, 2.2) @- numrange(2.0, 3.0); > the result is not the same as > select numrange(2.0, 3.0) @- numrange(1.1, 2.2); Correct, @- is not commutative. > So your categorize oprkind as 'b' for operator "@-" is wrong? > select oprname,oprkind,oprcanhash,oprcanmerge,oprleft,oprright,oprresult,oprcode > from pg_operator > where oprname = '@-'; 'b' is the correct oprkind. It is a binary (infix) operator. > aslo > select count(*), oprkind from pg_operator group by oprkind; > there are only 5% are prefix operators. > maybe we should design it as: > 1. if both inputs are empty range, the result array is empty. > 2. if both inputs are non-empty and never overlaps, put both of them > to the result array. > 3. if one input is empty another one is not, then put the non-empty > one into the result array. Also covered before, but if any of this still applies please let me know. > after applying the patch: now the catalog data seems not correct to me. > SELECT a1.amopfamily > ,a1.amoplefttype::regtype > ,a1.amoprighttype > ,a1.amopstrategy > ,amoppurpose > ,amopsortfamily > ,amopopr > ,op.oprname > ,am.amname > FROM pg_amop as a1 join pg_operator op on op.oid = a1.amopopr > join pg_am am on am.oid = a1.amopmethod > where amoppurpose = 'p'; > output: > amopfamily | amoplefttype | amoprighttype | amopstrategy | > amoppurpose | amopsortfamily | amopopr | oprname | amname > ------------+---------------+---------------+--------------+-------------+----------------+---------+---------+-------- > 2593 | box | 603 | 31 | p > | 0 | 803 | # | gist > 3919 | anyrange | 3831 | 31 | p > | 0 | 3900 | * | gist > 6158 | anymultirange | 4537 | 31 | p > | 0 | 4394 | * | gist > 3919 | anyrange | 3831 | 32 | p > | 0 | 8747 | @- | gist > 6158 | anymultirange | 4537 | 32 | p > | 0 | 8407 | @- | gist > (5 rows) > > select oprcode, oprname, oprleft::regtype > from pg_operator opr > where opr.oprname in ('#','*','@-') > and oprleft = oprright > and oprleft in (603,3831,4537); > output: > > oprcode | oprname | oprleft > ----------------------------+---------+--------------- > box_intersect | # | box > range_intersect | * | anyrange > multirange_intersect | * | anymultirange > range_without_portion | @- | anyrange > multirange_without_portion | @- | anymultirange > (5 rows) This seems correct. '#' is the name of the box overlaps operator. Probably I should add a box @- operator too. But see below. . . . > should amoppurpose = 'p' is true apply to ' @-' operator? Yes. > catalog-pg-amop.html: > ` > amopsortfamily oid (references pg_opfamily.oid): > The B-tree operator family this entry sorts according to, if an > ordering operator; zero if a search operator > ` > you should also update the above entry, the amopsortfamily is also > zero for "portion operator" for the newly implemented "portion > operator". Okay, done. > v21-0006-Add-UPDATE-DELETE-FOR-PORTION-OF.patch > create mode 100644 src/backend/utils/adt/period.c > create mode 100644 src/include/utils/period.h > you should put these two files to v21-0008-Add-PERIODs.patch. > it's not related to that patch, it also makes people easy to review. You're right, sorry! On 1/8/24 16:00, jian he wrote: > > +/* > + * ForPortionOfClause > + * representation of FOR PORTION OF <period-name> FROM <ts> TO <te> > + * or FOR PORTION OF <period-name> (<target>) > + */ > +typedef struct ForPortionOfClause > +{ > + NodeTag type; > + char *range_name; > + int range_name_location; > + Node *target; > + Node *target_start; > + Node *target_end; > +} ForPortionOfClause; > > "range_name_location" can be just "location"? > generally most of the struct put the "location" to the last field in the struct. > (that's the pattern I found all over other code) Agreed, done. > + if (isUpdate) > + { > + /* > + * Now make sure we update the start/end time of the record. > + * For a range col (r) this is `r = r * targetRange`. > + */ > + Expr *rangeSetExpr; > + TargetEntry *tle; > + > + strat = RTIntersectStrategyNumber; > + GetOperatorFromCanonicalStrategy(opclass, InvalidOid, "intersects", > "FOR PORTION OF", &opid, &strat); > + rangeSetExpr = (Expr *) makeSimpleA_Expr(AEXPR_OP, get_opname(opid), > + (Node *) copyObject(rangeVar), targetExpr, > + forPortionOf->range_name_location); > + rangeSetExpr = (Expr *) transformExpr(pstate, (Node *) rangeSetExpr, > EXPR_KIND_UPDATE_PORTION); > + > + /* Make a TLE to set the range column */ > + result->rangeSet = NIL; > + tle = makeTargetEntry(rangeSetExpr, range_attno, range_name, false); > + result->rangeSet = lappend(result->rangeSet, tle); > + > + /* Mark the range column as requiring update permissions */ > + target_perminfo->updatedCols = bms_add_member(target_perminfo->updatedCols, > + range_attno - FirstLowInvalidHeapAttributeNumber); > + } > + else > + result->rangeSet = NIL; > I think the name "rangeSet" is misleading, since "set" is generally > related to a set of records. > but here it's more about the "range intersect". Okay, I can see that. I used "rangeSet" because we add it to the SET clause of the UPDATE command. Here I've changed it to rangeTargetList. I think this matches other code and better indicates what it holds. Any objections? In the PERIOD patch we will need two TLEs here (that's why it's a List): one for the start column and one for the end column. > in ExecDelete > we have following code pattern: > ExecDeleteEpilogue(context, resultRelInfo, tupleid, oldtuple, changingPart); > if (processReturning && resultRelInfo->ri_projectReturning) > { > .... > if (!table_tuple_fetch_row_version(resultRelationDesc, tupleid, > SnapshotAny, slot)) > elog(ERROR, "failed to fetch deleted tuple for DELETE RETURNING"); > } > } > > but the ExecForPortionOfLeftovers is inside ExecDeleteEpilogue. > meaning even without ExecForPortionOfLeftovers, we can still call > table_tuple_fetch_row_version > also if it was *not* concurrently updated, then our current process > holds the lock until the ending of the transaction, i think. > So the following TODO is unnecessary? > > + /* > + * Get the range of the old pre-UPDATE/DELETE tuple, > + * so we can intersect it with the FOR PORTION OF target > + * and see if there are any "leftovers" to insert. > + * > + * We have already locked the tuple in ExecUpdate/ExecDelete > + * (TODO: if it was *not* concurrently updated, does > table_tuple_update lock the tuple itself? > + * I don't found the code for that yet, and maybe it depends on the AM?) > + * and it has passed EvalPlanQual. > + * Make sure we're looking at the most recent version. > + * Otherwise concurrent updates of the same tuple in READ COMMITTED > + * could insert conflicting "leftovers". > + */ > + if (!table_tuple_fetch_row_version(resultRelInfo->ri_RelationDesc, > tupleid, SnapshotAny, oldtupleSlot)) > + elog(ERROR, "failed to fetch tuple for FOR PORTION OF"); I think you're right. According to the comments on TM_Result (returned by table_tuple_update), a TM_Ok indicates that the lock was acquired. > +/* ---------------------------------------------------------------- > + * ExecForPortionOfLeftovers > + * > + * Insert tuples for the untouched timestamp of a row in a FOR > + * PORTION OF UPDATE/DELETE > + * ---------------------------------------------------------------- > + */ > +static void > +ExecForPortionOfLeftovers(ModifyTableContext *context, > + EState *estate, > + ResultRelInfo *resultRelInfo, > + ItemPointer tupleid) > > maybe change the comment to > "Insert tuples for the not intersection of a row in a FOR PORTION OF > UPDATE/DELETE." Changed to "untouched portion". > + deconstruct_array(DatumGetArrayTypeP(allLeftovers), > typcache->type_id, typcache->typlen, > + typcache->typbyval, typcache->typalign, &leftovers, NULL, &nleftovers); > + > + if (nleftovers > 0) > + { > I think add something like assert nleftovers >=0 && nleftovers <= 2 > (assume only range not multirange) would improve readability. I added the first assert. The second is not true for non-range types. > + <para> > + If the table has a range column or > + <link linkend="ddl-periods-application-periods"><literal>PERIOD</literal></link>, > + you may supply a <literal>FOR PORTION OF</literal> clause, and > your delete will > + only affect rows that overlap the given interval. Furthermore, if > a row's span > + extends outside the <literal>FOR PORTION OF</literal> bounds, then > your delete > + will only change the span within those bounds. In effect you are > deleting any > + moment targeted by <literal>FOR PORTION OF</literal> and no moments outside. > + </para> > + > + <para> > + Specifically, after <productname>PostgreSQL</productname> deletes > the existing row, > + it will <literal>INSERT</literal> > + new rows whose range or start/end column(s) receive the remaining > span outside > + the targeted bounds, containing the original values in other columns. > + There will be zero to two inserted records, > + depending on whether the original span extended before the targeted > + <literal>FROM</literal>, after the targeted <literal>TO</literal>, > both, or neither. > + </para> > + > + <para> > + These secondary inserts fire <literal>INSERT</literal> triggers. First > + <literal>BEFORE DELETE</literal> triggers first, then > + <literal>BEFORE INSERT</literal>, then <literal>AFTER INSERT</literal>, > + then <literal>AFTER DELETE</literal>. > + </para> > + > + <para> > + These secondary inserts do not require <literal>INSERT</literal> > privilege on the table. > + This is because conceptually no new information has been added. > The inserted rows only preserve > + existing data about the untargeted time period. Note this may > result in users firing <literal>INSERT</literal> > + triggers who don't have insert privileges, so be careful about > <literal>SECURITY DEFINER</literal> trigger functions! > + </para> > > I think you need to wrap them into a big paragraph, otherwise they > lose the context? > please see the attached build sql-update.html. Still TODO. > also I think > + <link linkend="ddl-periods-application-periods"><literal>PERIOD</literal></link>, > should shove into Add-PERIODs.patch. > > otherwise you cannot build Add-UPDATE-DELETE-FOR-PORTION-OF.patch > without all the patches. Fixed. > I think the "FOR-PORTION-OF" feature is kind of independ? > Because, IMHO, "for portion" is a range datum interacting with another > single range datum, but the primary key with "WITHOUT OVERLAPS", is > range datum interacting with a set of range datums. > now I cannot just git apply v22-0006-Add-UPDATE-DELETE-FOR-PORTION-OF.patch. > That maybe would make it more difficult to get commited? Still TODO. On 1/8/24 21:33, jian he wrote: > > src5=# select range_without_portion(numrange(1.0,3.0,'[]'), > numrange(1.5,2.0,'(]')); > range_without_portion > --------------------------- > {"[1.0,1.5]","(2.0,3.0]"} > (1 row) > > src5=# \gdesc > Column | Type > -----------------------+----------- > range_without_portion | numeric[] > (1 row) > > src5=# \df range_without_portion > List of functions > Schema | Name | Result data type | Argument data > types | Type > ------------+-----------------------+------------------+---------------------+------ > pg_catalog | range_without_portion | anyarray | anyrange, > anyrange | func > (1 row) > > so apparently, you cannot from (anyrange, anyrange) get anyarray the > element type is anyrange. > I cannot find the documented explanation in > https://www.postgresql.org/docs/current/extend-type-system.html#EXTEND-TYPES-POLYMORPHIC > > anyrange is POLYMORPHIC, anyarray is POLYMORPHIC, > but I suppose, getting an anyarray the element type is anyrange would be hard. You're right, that is a problem. I think the right approach is to make intersect and without_portion just be support functions, not operators. Then I don't need to introduce the new 'p' amop strategy at all, which seemed like a dubious idea anyway. Then the without_portion function can return a SETOF instead of an array. Another idea is to add more polymorphic types, anyrangearray and anymultirangearray, but maybe that is too big a thing. OTOH I have wanted those same types before. I will take a stab at it. On 1/11/24 06:44, Peter Eisentraut wrote: > Here is some more detailed review of the first two patches. (I reviewed v20; I see you have also > posted v21, but they don't appear very different for this purpose.) > > v20-0001-Add-stratnum-GiST-support-function.patch > > * contrib/btree_gist/Makefile > > Needs corresponding meson.build updates. Fixed. > * contrib/btree_gist/btree_gist--1.7--1.8.sql > > Should gist_stratnum_btree() live in contrib/btree_gist/ or in core? > Are there other extensions that use the btree strategy numbers for > gist? Moved. None of our other contrib extensions use it. I thought it would be friendly to offer it to outside extensions, but maybe that is too speculative. > +ALTER OPERATOR FAMILY gist_vbit_ops USING gist ADD > + FUNCTION 12 (varbit, varbit) gist_stratnum_btree (int2) ; > > Is there a reason for the extra space after FUNCTION here (repeated > throughout the file)? Fixed. > +-- added in 1.4: > > What is the purpose of these "added in" comments? I added those to help me make sure I was including every type in the extension, but I've taken them out here. > v20-0002-Add-temporal-PRIMARY-KEY-and-UNIQUE-constraints.patch > > * contrib/btree_gist/Makefile > > Also update meson.build. Done. > * contrib/btree_gist/sql/without_overlaps.sql > > Maybe also insert a few values, to verify that the constraint actually > does something? Done. > * doc/src/sgml/ref/create_table.sgml > > Is "must have a range type" still true? With the changes to the > strategy number mapping, any type with a supported operator class > should work? Updated. Probably more docs to come; I want to go through them all now that we support more types. > * src/backend/utils/adt/ruleutils.c > > Is it actually useful to add an argument to > decompile_column_index_array()? Wouldn't it be easier to just print > the " WITHOUT OVERLAPS" in the caller after returning from it? Okay, done. > * src/include/access/gist_private.h > > The added function gistTranslateStratnum() isn't really "private" to > gist. So access/gist.h would be a better place for it. Moved. > Also, most other functions there appear to be named "GistSomething", > so a more consistent name might be GistTranslateStratnum. > > * src/include/access/stratnum.h Changed. > The added StrategyIsValid() doesn't seem that useful? Plenty of > existing code just compares against InvalidStrategy, and there is only > one caller for the new function. I suggest to do without it. > > * src/include/commands/defrem.h Okay, removed. > We are using two terms here, well-known strategy number and canonical > strategy number, to mean the same thing (I think?). Let's try to > stick with one. Or explain the relationship? True. Changed everything to "well-known" which seems like a better match for what's going on. I haven't gone through jian he's Jan 13 patch yet, but since he was also implementing Peter's requests I thought I should share what I have. I did this work a while ago, but I was hoping to finish the TODOs above first, and then we got hit with a winter storm that knocked out power. Sorry to cause duplicate work! Rebased to 2f35c14cfb. Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
Attachment
- v23-0001-Add-stratnum-GiST-support-function.patch
- v23-0002-Add-temporal-PRIMARY-KEY-and-UNIQUE-constraints.patch
- v23-0003-Add-GiST-referencedagg-support-func.patch
- v23-0004-Add-temporal-FOREIGN-KEYs.patch
- v23-0005-Add-multi-range_without_portion-proc-operator.patch
- v23-0006-Add-UPDATE-DELETE-FOR-PORTION-OF.patch
- v23-0007-Add-CASCADE-SET-NULL-SET-DEFAULT-for-temporal-fo.patch
- v23-0008-Add-PERIODs.patch
pgsql-hackers by date: