Re: pg_dump versus hash partitioning - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: pg_dump versus hash partitioning |
Date | |
Msg-id | 1912821.1676402493@sss.pgh.pa.us Whole thread Raw |
In response to | Re: pg_dump versus hash partitioning (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: pg_dump versus hash partitioning
Re: pg_dump versus hash partitioning |
List | pgsql-hackers |
Here's a set of draft patches around this issue. 0001 does what I last suggested, ie force load-via-partition-root for leaf tables underneath a partitioned table with a partitioned-by-hash enum column. It wasn't quite as messy as I first feared, although we do need a new query (and pg_dump now knows more about pg_partitioned_table than it used to). I was a bit unhappy to read this in the documentation: It is best not to use parallelism when restoring from an archive made with this option, because <application>pg_restore</application> will not know exactly which partition(s) a given archive data item will load data into. This could result in inefficiency due to lock conflicts between parallel jobs, or perhaps even restore failures due to foreign key constraints being set up before all the relevant data is loaded. This made me wonder if this could be a usable solution at all, but after thinking for awhile, I don't see how the claim about foreign key constraints is anything but FUD. pg_dump/pg_restore have sufficient dependency logic to prevent that from happening. I think we can just drop the "or perhaps ..." clause here, and tolerate the possible inefficiency as better than failing. 0002 and 0003 are not part of the bug fix, but are some performance improvements I noticed while working on this. 0002 is pretty minor, but 0003 is possibly significant if you have a ton of partitions. I haven't done any performance measurement on it though. regards, tom lane From 0c66c9f1211e16366cf471ef48a52e5447c79424 Mon Sep 17 00:00:00 2001 From: Tom Lane <tgl@sss.pgh.pa.us> Date: Tue, 14 Feb 2023 13:09:04 -0500 Subject: [PATCH v1 1/3] Force load-via-partition-root for hash partitioning on an enum column. Without this, dump and restore will almost always fail because the enum values will receive different OIDs in the destination database, and their hash codes will therefore be different. (Improving the hash algorithm would not make this situation better, and would indeed break other cases as well.) Discussion: https://postgr.es/m/1376149.1675268279@sss.pgh.pa.us --- src/bin/pg_dump/common.c | 18 +++--- src/bin/pg_dump/pg_dump.c | 120 +++++++++++++++++++++++++++++++++++--- src/bin/pg_dump/pg_dump.h | 2 + 3 files changed, 124 insertions(+), 16 deletions(-) diff --git a/src/bin/pg_dump/common.c b/src/bin/pg_dump/common.c index a43f2e5553..3d0cfc1b8e 100644 --- a/src/bin/pg_dump/common.c +++ b/src/bin/pg_dump/common.c @@ -230,6 +230,9 @@ getSchemaData(Archive *fout, int *numTablesPtr) pg_log_info("flagging inherited columns in subtables"); flagInhAttrs(fout->dopt, tblinfo, numTables); + pg_log_info("reading partitioning data"); + getPartitioningInfo(fout); + pg_log_info("reading indexes"); getIndexes(fout, tblinfo, numTables); @@ -285,7 +288,6 @@ static void flagInhTables(Archive *fout, TableInfo *tblinfo, int numTables, InhInfo *inhinfo, int numInherits) { - DumpOptions *dopt = fout->dopt; int i, j; @@ -301,18 +303,18 @@ flagInhTables(Archive *fout, TableInfo *tblinfo, int numTables, continue; /* - * Normally, we don't bother computing anything for non-target tables, - * but if load-via-partition-root is specified, we gather information - * on every partition in the system so that getRootTableInfo can trace - * from any given to leaf partition all the way up to the root. (We - * don't need to mark them as interesting for getTableAttrs, though.) + * Normally, we don't bother computing anything for non-target tables. + * However, we must find the parents of non-root partitioned tables in + * any case, so that we can trace from leaf partitions up to the root + * (in case a leaf is to be dumped but its parents are not). We need + * not mark such parents interesting for getTableAttrs, though. */ if (!tblinfo[i].dobj.dump) { mark_parents = false; - if (!dopt->load_via_partition_root || - !tblinfo[i].ispartition) + if (!(tblinfo[i].relkind == RELKIND_PARTITIONED_TABLE && + tblinfo[i].ispartition)) find_parents = false; } diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index 527c7651ab..51b317aa3d 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -320,6 +320,7 @@ static void appendReloptionsArrayAH(PQExpBuffer buffer, const char *reloptions, static char *get_synchronized_snapshot(Archive *fout); static void setupDumpWorker(Archive *AH); static TableInfo *getRootTableInfo(const TableInfo *tbinfo); +static bool forcePartitionRootLoad(const TableInfo *tbinfo); int @@ -2228,11 +2229,13 @@ dumpTableData_insert(Archive *fout, const void *dcontext) insertStmt = createPQExpBuffer(); /* - * When load-via-partition-root is set, get the root table name - * for the partition table, so that we can reload data through the - * root table. + * When load-via-partition-root is set or forced, get the root + * table name for the partition table, so that we can reload data + * through the root table. */ - if (dopt->load_via_partition_root && tbinfo->ispartition) + if (tbinfo->ispartition && + (dopt->load_via_partition_root || + forcePartitionRootLoad(tbinfo))) targettab = getRootTableInfo(tbinfo); else targettab = tbinfo; @@ -2430,6 +2433,35 @@ getRootTableInfo(const TableInfo *tbinfo) return parentTbinfo; } +/* + * forcePartitionRootLoad + * Check if we must force load_via_partition_root for this partition. + * + * This is required if any level of ancestral partitioned table has an + * unsafe partitioning scheme. + */ +static bool +forcePartitionRootLoad(const TableInfo *tbinfo) +{ + TableInfo *parentTbinfo; + + Assert(tbinfo->ispartition); + Assert(tbinfo->numParents == 1); + + parentTbinfo = tbinfo->parents[0]; + if (parentTbinfo->unsafe_partitions) + return true; + while (parentTbinfo->ispartition) + { + Assert(parentTbinfo->numParents == 1); + parentTbinfo = parentTbinfo->parents[0]; + if (parentTbinfo->unsafe_partitions) + return true; + } + + return false; +} + /* * dumpTableData - * dump the contents of a single table @@ -2456,11 +2488,13 @@ dumpTableData(Archive *fout, const TableDataInfo *tdinfo) dumpFn = dumpTableData_copy; /* - * When load-via-partition-root is set, get the root table name for - * the partition table, so that we can reload data through the root - * table. + * When load-via-partition-root is set or forced, get the root table + * name for the partition table, so that we can reload data through + * the root table. */ - if (dopt->load_via_partition_root && tbinfo->ispartition) + if (tbinfo->ispartition && + (dopt->load_via_partition_root || + forcePartitionRootLoad(tbinfo))) { TableInfo *parentTbinfo; @@ -6744,6 +6778,76 @@ getInherits(Archive *fout, int *numInherits) return inhinfo; } +/* + * getPartitioningInfo + * get information about partitioning + * + * For the most part, we only collect partitioning info about tables we + * intend to dump. However, this function has to consider all partitioned + * tables in the database, because we need to know about parents of partitions + * we are going to dump even if the parents themselves won't be dumped. + * + * Specifically, what we need to know is whether each partitioned table + * has an "unsafe" partitioning scheme that requires us to force + * load-via-partition-root mode for its children. Currently the only case + * for which we force that is hash partitioning on enum columns, since the + * hash codes depend on enum value OIDs which won't be replicated across + * dump-and-reload. There are other cases in which load-via-partition-root + * might be necessary, but we expect users to cope with them. + */ +void +getPartitioningInfo(Archive *fout) +{ + PQExpBuffer query; + PGresult *res; + int ntups; + + /* no partitions before v10 */ + if (fout->remoteVersion < 100000) + return; + /* needn't bother if schema-only dump */ + if (fout->dopt->schemaOnly) + return; + + query = createPQExpBuffer(); + + /* + * Unsafe partitioning schemes are exactly those for which hash enum_ops + * appears among the partition opclasses. We needn't check partstrat. + * + * Note that this query may well retrieve info about tables we aren't + * going to dump and hence have no lock on. That's okay since we need not + * invoke any unsafe server-side functions. + */ + appendPQExpBufferStr(query, + "SELECT partrelid FROM pg_partitioned_table WHERE\n" + "(SELECT c.oid FROM pg_opclass c JOIN pg_am a " + "ON c.opcmethod = a.oid\n" + "WHERE opcname = 'enum_ops' " + "AND opcnamespace = 'pg_catalog'::regnamespace " + "AND amname = 'hash') = ANY(partclass)"); + + res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK); + + ntups = PQntuples(res); + + for (int i = 0; i < ntups; i++) + { + Oid tabrelid = atooid(PQgetvalue(res, i, 0)); + TableInfo *tbinfo; + + tbinfo = findTableByOid(tabrelid); + if (tbinfo == NULL) + pg_fatal("failed sanity check, table OID %u appearing in pg_partitioned_table not found", + tabrelid); + tbinfo->unsafe_partitions = true; + } + + PQclear(res); + + destroyPQExpBuffer(query); +} + /* * getIndexes * get information about every index on a dumpable table diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h index e7cbd8d7ed..bfd3cf17b7 100644 --- a/src/bin/pg_dump/pg_dump.h +++ b/src/bin/pg_dump/pg_dump.h @@ -318,6 +318,7 @@ typedef struct _tableInfo bool dummy_view; /* view's real definition must be postponed */ bool postponed_def; /* matview must be postponed into post-data */ bool ispartition; /* is table a partition? */ + bool unsafe_partitions; /* is it an unsafe partitioned table? */ /* * These fields are computed only if we decide the table is interesting @@ -715,6 +716,7 @@ extern ConvInfo *getConversions(Archive *fout, int *numConversions); extern TableInfo *getTables(Archive *fout, int *numTables); extern void getOwnedSeqs(Archive *fout, TableInfo tblinfo[], int numTables); extern InhInfo *getInherits(Archive *fout, int *numInherits); +extern void getPartitioningInfo(Archive *fout); extern void getIndexes(Archive *fout, TableInfo tblinfo[], int numTables); extern void getExtendedStatistics(Archive *fout); extern void getConstraints(Archive *fout, TableInfo tblinfo[], int numTables); -- 2.31.1 From 5f73193c120ce04f4e5bb3530fe1ee84afc841f1 Mon Sep 17 00:00:00 2001 From: Tom Lane <tgl@sss.pgh.pa.us> Date: Tue, 14 Feb 2023 13:18:29 -0500 Subject: [PATCH v1 2/3] Don't create useless TableAttachInfo objects. It's silly to create a TableAttachInfo object that we're not going to dump, when we know perfectly well at creation time that it won't be dumped. Discussion: https://postgr.es/m/1376149.1675268279@sss.pgh.pa.us --- src/bin/pg_dump/common.c | 3 ++- src/bin/pg_dump/pg_dump.c | 3 --- 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/src/bin/pg_dump/common.c b/src/bin/pg_dump/common.c index 3d0cfc1b8e..ce9d2a8ee8 100644 --- a/src/bin/pg_dump/common.c +++ b/src/bin/pg_dump/common.c @@ -336,7 +336,8 @@ flagInhTables(Archive *fout, TableInfo *tblinfo, int numTables, } /* Create TableAttachInfo object if needed */ - if (tblinfo[i].dobj.dump && tblinfo[i].ispartition) + if ((tblinfo[i].dobj.dump & DUMP_COMPONENT_DEFINITION) && + tblinfo[i].ispartition) { TableAttachInfo *attachinfo; diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index 51b317aa3d..35bba8765f 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -16082,9 +16082,6 @@ dumpTableAttach(Archive *fout, const TableAttachInfo *attachinfo) if (dopt->dataOnly) return; - if (!(attachinfo->partitionTbl->dobj.dump & DUMP_COMPONENT_DEFINITION)) - return; - q = createPQExpBuffer(); if (!fout->is_prepared[PREPQUERY_DUMPTABLEATTACH]) -- 2.31.1 From 7cc336043e750ec154185b972970b2364f0f57ee Mon Sep 17 00:00:00 2001 From: Tom Lane <tgl@sss.pgh.pa.us> Date: Tue, 14 Feb 2023 13:35:52 -0500 Subject: [PATCH v1 3/3] Simplify pg_dump's creation of parent-table links. Instead of trying to optimize this by skipping creation of the links for tables we don't plan to dump, just create them all in bulk with a single scan over the pg_inherits data. The previous approach was more or less O(N^2) in the number of pg_inherits entries, not to mention being way too complicated. Discussion: https://postgr.es/m/1376149.1675268279@sss.pgh.pa.us --- src/bin/pg_dump/common.c | 125 ++++++++++++++++---------------------- src/bin/pg_dump/pg_dump.h | 5 +- 2 files changed, 54 insertions(+), 76 deletions(-) diff --git a/src/bin/pg_dump/common.c b/src/bin/pg_dump/common.c index ce9d2a8ee8..84d99ee8b6 100644 --- a/src/bin/pg_dump/common.c +++ b/src/bin/pg_dump/common.c @@ -83,8 +83,6 @@ static void flagInhTables(Archive *fout, TableInfo *tblinfo, int numTables, InhInfo *inhinfo, int numInherits); static void flagInhIndexes(Archive *fout, TableInfo *tblinfo, int numTables); static void flagInhAttrs(DumpOptions *dopt, TableInfo *tblinfo, int numTables); -static void findParentsByOid(TableInfo *self, - InhInfo *inhinfo, int numInherits); static int strInArray(const char *pattern, char **arr, int arr_size); static IndxInfo *findIndexByOid(Oid oid); @@ -288,45 +286,70 @@ static void flagInhTables(Archive *fout, TableInfo *tblinfo, int numTables, InhInfo *inhinfo, int numInherits) { + TableInfo *child = NULL; + TableInfo *parent = NULL; int i, j; - for (i = 0; i < numTables; i++) + /* + * Set up links from child tables to their parents. + * + * We used to attempt to skip this work for tables that are not to be + * dumped; but the optimizable cases are rare in practice, and setting up + * these links in bulk is cheaper than the old way. (Note in particular + * that it's very rare for a child to have more than one parent.) + */ + for (i = 0; i < numInherits; i++) { - bool find_parents = true; - bool mark_parents = true; - - /* Some kinds never have parents */ - if (tblinfo[i].relkind == RELKIND_SEQUENCE || - tblinfo[i].relkind == RELKIND_VIEW || - tblinfo[i].relkind == RELKIND_MATVIEW) - continue; - /* - * Normally, we don't bother computing anything for non-target tables. - * However, we must find the parents of non-root partitioned tables in - * any case, so that we can trace from leaf partitions up to the root - * (in case a leaf is to be dumped but its parents are not). We need - * not mark such parents interesting for getTableAttrs, though. + * Skip a hashtable lookup if it's same table as last time. This is + * unlikely for the child, but less so for the parent. (Maybe we + * should ask the backend for a sorted array to make it more likely? + * Not clear the sorting effort would be repaid, though.) */ - if (!tblinfo[i].dobj.dump) + if (child == NULL || + child->dobj.catId.oid != inhinfo[i].inhrelid) { - mark_parents = false; + child = findTableByOid(inhinfo[i].inhrelid); - if (!(tblinfo[i].relkind == RELKIND_PARTITIONED_TABLE && - tblinfo[i].ispartition)) - find_parents = false; + /* + * If we find no TableInfo, assume the pg_inherits entry is for a + * partitioned index, which we don't need to track. + */ + if (child == NULL) + continue; } + if (parent == NULL || + parent->dobj.catId.oid != inhinfo[i].inhparent) + { + parent = findTableByOid(inhinfo[i].inhparent); + if (parent == NULL) + pg_fatal("failed sanity check, parent OID %u of table \"%s\" (OID %u) not found", + inhinfo[i].inhparent, + child->dobj.name, + child->dobj.catId.oid); + } + /* Add this parent to the child's list of parents. */ + if (child->numParents > 0) + child->parents = pg_realloc_array(child->parents, + TableInfo *, + child->numParents + 1); + else + child->parents = pg_malloc_array(TableInfo *, 1); + child->parents[child->numParents++] = parent; + } - /* If needed, find all the immediate parent tables. */ - if (find_parents) - findParentsByOid(&tblinfo[i], inhinfo, numInherits); - + /* + * Now consider all child tables and mark parents interesting as needed. + */ + for (i = 0; i < numTables; i++) + { /* * If needed, mark the parents as interesting for getTableAttrs and - * getIndexes. + * getIndexes. We only need this for direct parents of dumpable + * tables. */ - if (mark_parents) + if (tblinfo[i].dobj.dump) { int numParents = tblinfo[i].numParents; TableInfo **parents = tblinfo[i].parents; @@ -979,52 +1002,6 @@ findOwningExtension(CatalogId catalogId) } -/* - * findParentsByOid - * find a table's parents in tblinfo[] - */ -static void -findParentsByOid(TableInfo *self, - InhInfo *inhinfo, int numInherits) -{ - Oid oid = self->dobj.catId.oid; - int i, - j; - int numParents; - - numParents = 0; - for (i = 0; i < numInherits; i++) - { - if (inhinfo[i].inhrelid == oid) - numParents++; - } - - self->numParents = numParents; - - if (numParents > 0) - { - self->parents = pg_malloc_array(TableInfo *, numParents); - j = 0; - for (i = 0; i < numInherits; i++) - { - if (inhinfo[i].inhrelid == oid) - { - TableInfo *parent; - - parent = findTableByOid(inhinfo[i].inhparent); - if (parent == NULL) - pg_fatal("failed sanity check, parent OID %u of table \"%s\" (OID %u) not found", - inhinfo[i].inhparent, - self->dobj.name, - oid); - self->parents[j++] = parent; - } - } - } - else - self->parents = NULL; -} - /* * parseOidArray * parse a string of numbers delimited by spaces into a character array diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h index bfd3cf17b7..bb4e981e7d 100644 --- a/src/bin/pg_dump/pg_dump.h +++ b/src/bin/pg_dump/pg_dump.h @@ -320,6 +320,9 @@ typedef struct _tableInfo bool ispartition; /* is table a partition? */ bool unsafe_partitions; /* is it an unsafe partitioned table? */ + int numParents; /* number of (immediate) parent tables */ + struct _tableInfo **parents; /* TableInfos of immediate parents */ + /* * These fields are computed only if we decide the table is interesting * (it's either a table to dump, or a direct parent of a dumpable table). @@ -352,8 +355,6 @@ typedef struct _tableInfo /* * Stuff computed only for dumpable tables. */ - int numParents; /* number of (immediate) parent tables */ - struct _tableInfo **parents; /* TableInfos of immediate parents */ int numIndexes; /* number of indexes */ struct _indxInfo *indexes; /* indexes */ struct _tableDataInfo *dataObj; /* TableDataInfo, if dumping its data */ -- 2.31.1
pgsql-hackers by date: