Re: [HACKERS] [POC] hash partitioning - Mailing list pgsql-hackers
From | Ashutosh Bapat |
---|---|
Subject | Re: [HACKERS] [POC] hash partitioning |
Date | |
Msg-id | CAFjFpReTGCy78YqFECd==JtOvF=tqusznG+5R-6Sk32+ebZpOw@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] [POC] hash partitioning (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>) |
Responses |
Re: [HACKERS] [POC] hash partitioning
|
List | pgsql-hackers |
Hi, Here's patch with some cosmetic fixes to 0002, to be applied on top of 0002. On Tue, May 16, 2017 at 1:02 PM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote: > On Sun, May 14, 2017 at 12:30 PM, amul sul <sulamul@gmail.com> wrote: >> On Fri, May 12, 2017 at 10:39 PM, Ashutosh Bapat >> <ashutosh.bapat@enterprisedb.com> wrote: >>> On Fri, May 12, 2017 at 6:08 PM, amul sul <sulamul@gmail.com> wrote: >>>> Hi, >>>> >>>> Please find the following updated patches attached: >>>> >>>> 0001-Cleanup.patch : Does some cleanup and code refactoring required >>>> for hash partition patch. Otherwise, there will be unnecessary diff in >>>> 0002 patch >>> >>> Thanks for splitting the patch. >>> >>> + if (isnull[0]) >>> + cur_index = partdesc->boundinfo->null_index; >>> This code assumes that null_index will be set to -1 when has_null is false. Per >>> RelationBuildPartitionDesc() this is true. But probably we should write this >>> code as >>> if (isnull[0]) >>> { >>> if (partdesc->boundinfo->has_null) >>> cur_index = partdesc->boundinfo->null_index; >>> } >>> That way we are certain that when has_null is false, cur_index = -1 similar to >>> the original code. >>> >> Okay will add this. > > Thanks. > >> I still don't understood point of having has_null >> variable, if no null accepting partition exists then null_index is >> alway set to -1 in RelationBuildPartitionDesc. Anyway, let not change >> the original code. > > I agree. has_null might have been folded as null_index == -1. But > that's not the problem of this patch. > > 0001 looks good to me now. > > >> >> [...] >>> >>> + if (key->strategy == PARTITION_STRATEGY_HASH) >>> + { >>> + ndatums = nparts; >>> + hbounds = (PartitionHashBound **) palloc(nparts * >>> + >>> sizeof(PartitionHashBound *)); >>> + i = 0; >>> + foreach (cell, boundspecs) >>> + { >>> + PartitionBoundSpec *spec = lfirst(cell); >>> + >>> [ clipped ] >>> + hbounds[i]->index = i; >>> + i++; >>> + } >>> For list and range partitioned table we order the bounds so that two >>> partitioned tables have them in the same order irrespective of order in which >>> they are specified by the user or hence stored in the catalogs. The partitions >>> then get indexes according the order in which their bounds appear in ordered >>> arrays of bounds. Thus any two partitioned tables with same partition >>> specification always have same PartitionBoundInfoData. This helps in >>> partition-wise join to match partition bounds of two given tables. Above code >>> assigns the indexes to the partitions as they appear in the catalogs. This >>> means that two partitioned tables with same partition specification but >>> different order for partition bound specification will have different >>> PartitionBoundInfoData represenation. >>> >>> If we do that, probably partition_bounds_equal() would reduce to just matching >>> indexes and the last element of datums array i.e. the greatest modulus datum. >>> If ordered datums array of two partitioned table do not match exactly, the >>> mismatch can be because missing datums or different datums. If it's a missing >>> datum it will change the greatest modulus or have corresponding entry in >>> indexes array as -1. If the entry differs it will cause mismatching indexes in >>> the index arrays. >>> >> Make sense, will fix this. > > I don't see this being addressed in the patches attached in the reply to Dilip. > >> >>> >>> + if (key->partattrs[i] != 0) >>> + { >>> + keyCol = (Node *) makeVar(1, >>> + key->partattrs[i], >>> + key->parttypid[i], >>> + key->parttypmod[i], >>> + key->parttypcoll[i], >>> + 0); >>> + >>> + /* Form hash_fn(value) expression */ >>> + keyCol = (Node *) makeFuncExpr(key->partsupfunc[i].fn_oid, >>> + get_fn_expr_rettype(&key->partsupfunc[i]), >>> + list_make1(keyCol), >>> + InvalidOid, >>> + InvalidOid, >>> + COERCE_EXPLICIT_CALL); >>> + } >>> + else >>> + { >>> + keyCol = (Node *) copyObject(lfirst(partexprs_item)); >>> + partexprs_item = lnext(partexprs_item); >>> + } >>> I think we should add FuncExpr for column Vars as well as expressions. >>> >> Okay, will fix this. > > Here, please add a check similar to get_quals_for_range() > 1840 if (partexprs_item == NULL) > 1841 elog(ERROR, "wrong number of partition key expressions"); > > >> >>> I think we need more comments for compute_hash_value(), mix_hash_value() and >>> satisfies_hash_partition() as to what each of them accepts and what it >>> computes. >>> >>> + /* key's hash values start from third argument of function. */ >>> + if (!PG_ARGISNULL(i + 2)) >>> + { >>> + values[i] = PG_GETARG_DATUM(i + 2); >>> + isnull[i] = false; >>> + } >>> + else >>> + isnull[i] = true; >>> You could write this as >>> isnull[i] = PG_ARGISNULL(i + 2); >>> if (isnull[i]) >>> values[i] = PG_GETARG_DATUM(i + 2); >>> >> Okay. > > If we have used this technique somewhere else in PG code, please > mention that function/place. > /* > * Rotate hash left 1 bit before mixing in the next column. This > * prevents equal values in different keys from cancelling each other. > */ > > >> >>> + foreach (lc, $5) >>> + { >>> + DefElem *opt = (DefElem *) lfirst(lc); >>> A search on WITH in gram.y shows that we do not handle WITH options in gram.y. >>> Usually they are handled at the transformation stage. Why is this an exception? >>> If you do that, we can have all the error handling in >>> transformPartitionBound(). >>> >> If so, ForValues need to return list for hash and PartitionBoundSpec >> for other two; wouldn't that break code consistency? And such >> validation is not new in gram.y see xmltable_column_el. > > Thanks for pointing that out. Ok, then may be leave it in gram.y. But > may be we should move the error handling in transform function. > > >> >>> +DATA(insert OID = 5028 ( satisfies_hash_partition PGNSP PGUID 12 1 0 >>> 2276 0 f f f f f f i s 3 0 16 "23 23 2276" _null_ _null_ _null_ _null_ >>> _null_ satisfies_hash_partition _null_ _null_ _null_ )); >>> Why is third argument to this function ANY? Shouldn't it be INT4ARRAY (variadic >>> INT4)? >>> >> Will use INT4ARRAY in next patch, but I am little sceptical of it. we >> need an unsigned int32, but unfortunately there is not variadic uint32 >> support. How about INT8ARRAY? > > Hmm, I think as long as the binary representation of given unsigned > integer doesn't change in the function call, we could cast an INT32 > datums into unsigned int32, so spending extra 4 bytes per partition > key doesn't look like worth the effort. > > A related question is, all hash functions have return type as > "integer" but internally they return uint32. Why not to do the same > for this function as well? > > -- > Best Wishes, > Ashutosh Bapat > EnterpriseDB Corporation > The Postgres Database Company -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
pgsql-hackers by date: