Re: [HACKERS] [POC] hash partitioning - Mailing list pgsql-hackers
From | amul sul |
---|---|
Subject | Re: [HACKERS] [POC] hash partitioning |
Date | |
Msg-id | CAAJ_b95w0q5uh8XYiAVX4+rzDmsv7v7U71AE5smsyaOKZ0X=Vw@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] [POC] hash partitioning (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>) |
Responses |
Re: [HACKERS] [POC] hash partitioning
|
List | pgsql-hackers |
On Tue, May 16, 2017 at 1:02 PM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote: [...] >>> >>> + if (key->strategy == PARTITION_STRATEGY_HASH) >>> + { >>> + ndatums = nparts; >>> + hbounds = (PartitionHashBound **) palloc(nparts * >>> + >>> sizeof(PartitionHashBound *)); >>> + i = 0; >>> + foreach (cell, boundspecs) >>> + { >>> + PartitionBoundSpec *spec = lfirst(cell); >>> + >>> [ clipped ] >>> + hbounds[i]->index = i; >>> + i++; >>> + } >>> For list and range partitioned table we order the bounds so that two >>> partitioned tables have them in the same order irrespective of order in which >>> they are specified by the user or hence stored in the catalogs. The partitions >>> then get indexes according the order in which their bounds appear in ordered >>> arrays of bounds. Thus any two partitioned tables with same partition >>> specification always have same PartitionBoundInfoData. This helps in >>> partition-wise join to match partition bounds of two given tables. Above code >>> assigns the indexes to the partitions as they appear in the catalogs. This >>> means that two partitioned tables with same partition specification but >>> different order for partition bound specification will have different >>> PartitionBoundInfoData represenation. >>> >>> If we do that, probably partition_bounds_equal() would reduce to just matching >>> indexes and the last element of datums array i.e. the greatest modulus datum. >>> If ordered datums array of two partitioned table do not match exactly, the >>> mismatch can be because missing datums or different datums. If it's a missing >>> datum it will change the greatest modulus or have corresponding entry in >>> indexes array as -1. If the entry differs it will cause mismatching indexes in >>> the index arrays. >>> >> Make sense, will fix this. > > I don't see this being addressed in the patches attached in the reply to Dilip. > Fixed in the attached version. >> >>> >>> + if (key->partattrs[i] != 0) >>> + { >>> + keyCol = (Node *) makeVar(1, >>> + key->partattrs[i], >>> + key->parttypid[i], >>> + key->parttypmod[i], >>> + key->parttypcoll[i], >>> + 0); >>> + >>> + /* Form hash_fn(value) expression */ >>> + keyCol = (Node *) makeFuncExpr(key->partsupfunc[i].fn_oid, >>> + get_fn_expr_rettype(&key->partsupfunc[i]), >>> + list_make1(keyCol), >>> + InvalidOid, >>> + InvalidOid, >>> + COERCE_EXPLICIT_CALL); >>> + } >>> + else >>> + { >>> + keyCol = (Node *) copyObject(lfirst(partexprs_item)); >>> + partexprs_item = lnext(partexprs_item); >>> + } >>> I think we should add FuncExpr for column Vars as well as expressions. >>> >> Okay, will fix this. > > Here, please add a check similar to get_quals_for_range() > 1840 if (partexprs_item == NULL) > 1841 elog(ERROR, "wrong number of partition key expressions"); > > Fixed in the attached version. >> >>> I think we need more comments for compute_hash_value(), mix_hash_value() and >>> satisfies_hash_partition() as to what each of them accepts and what it >>> computes. >>> >>> + /* key's hash values start from third argument of function. */ >>> + if (!PG_ARGISNULL(i + 2)) >>> + { >>> + values[i] = PG_GETARG_DATUM(i + 2); >>> + isnull[i] = false; >>> + } >>> + else >>> + isnull[i] = true; >>> You could write this as >>> isnull[i] = PG_ARGISNULL(i + 2); >>> if (isnull[i]) >>> values[i] = PG_GETARG_DATUM(i + 2); >>> >> Okay. > > If we have used this technique somewhere else in PG code, please > mention that function/place. > /* > * Rotate hash left 1 bit before mixing in the next column. This > * prevents equal values in different keys from cancelling each other. > */ > Fixed in the attached version. > >> >>> + foreach (lc, $5) >>> + { >>> + DefElem *opt = (DefElem *) lfirst(lc); >>> A search on WITH in gram.y shows that we do not handle WITH options in gram.y. >>> Usually they are handled at the transformation stage. Why is this an exception? >>> If you do that, we can have all the error handling in >>> transformPartitionBound(). >>> >> If so, ForValues need to return list for hash and PartitionBoundSpec >> for other two; wouldn't that break code consistency? And such >> validation is not new in gram.y see xmltable_column_el. > > Thanks for pointing that out. Ok, then may be leave it in gram.y. But > may be we should move the error handling in transform function. > IMO, let it be there for readability. It will be easier to understand why do we have set -1 for modulus and remainder. > >> >>> +DATA(insert OID = 5028 ( satisfies_hash_partition PGNSP PGUID 12 1 0 >>> 2276 0 f f f f f f i s 3 0 16 "23 23 2276" _null_ _null_ _null_ _null_ >>> _null_ satisfies_hash_partition _null_ _null_ _null_ )); >>> Why is third argument to this function ANY? Shouldn't it be INT4ARRAY (variadic >>> INT4)? >>> >> Will use INT4ARRAY in next patch, but I am little sceptical of it. we >> need an unsigned int32, but unfortunately there is not variadic uint32 >> support. How about INT8ARRAY? > > Hmm, I think as long as the binary representation of given unsigned > integer doesn't change in the function call, we could cast an INT32 > datums into unsigned int32, so spending extra 4 bytes per partition > key doesn't look like worth the effort. > > A related question is, all hash functions have return type as > "integer" but internally they return uint32. Why not to do the same > for this function as well? I see. IIUC, there is no harm to use INT4ARRAY, thanks for explanation. Regards, Amul -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
pgsql-hackers by date: