[HACKERS] path toward faster partition pruning - Mailing list pgsql-hackers
From | Amit Langote |
---|---|
Subject | [HACKERS] path toward faster partition pruning |
Date | |
Msg-id | 098b9c71-1915-1a2a-8d52-1a7a50ce79e8@lab.ntt.co.jp Whole thread Raw |
Responses |
Re: [HACKERS] path toward faster partition pruning
Re: [HACKERS] path toward faster partition pruning Re: [HACKERS] path toward faster partition pruning |
List | pgsql-hackers |
I've been working on implementing a way to perform plan-time partition-pruning that is hopefully faster than the current method of using constraint exclusion to prune each of the potentially many partitions one-by-one. It's not fully cooked yet though. Meanwhile, I thought I'd share a couple of patches that implement some restructuring of the planner code related to partitioned table inheritance planning that I think would be helpful. They are to be applied on top of the patches being discussed at [1]. Note that these patches themselves don't implement the actual code that replaces constraint exclusion as a method of performing partition pruning. I will share that patch after debugging it some more. The main design goal of the patches I'm sharing here now is to defer the locking and opening of leaf partitions in a given partition tree to a point after set_append_rel_size() is called on the root partitioned table. Currently, AFAICS, we need to lock and open the child tables in expand_inherited_rtentry() only to set the translated_vars field in AppendRelInfo that we create for the child. ISTM, we can defer the creation of a child AppendRelInfo to a point when it (its field translated_vars in particular) will actually be used and so lock and open the child tables only at such a time. Although we don't lock and open the partition child tables in expand_inherited_rtentry(), their RT entries are still created and added to root->parse->rtable, so that setup_simple_rel_arrays() knows the maximum number of entries root->simple_rel_array will need to hold and allocate the memory for that array accordingly. Slots in simple_rel_array[] corresponding to partition child tables will be empty until they are created when set_append_rel_size() is called on the root parent table and it determines the partitions that will be scanned after all. Patch augments the existing PartitionedChildRelInfo node, which currently holds only the partitioned child rel RT indexes, to carry some more information about the partition tree, which includes the information returned by RelationGetPartitionDispatchInfo() when it is called from expand_inherited_rtentry() (per the proposed patch in [1], we call it to be able to add partitions to the query tree in the bound order). Actually, since PartitionedChildRelInfo now contains more information about the partition tree than it used to before, I thought the struct's name is no longer relevant, so renamed it to PartitionRootInfo and renamed root->pcinfo_list accordingly to prinfo_list. That seems okay because we only use that node internally. Then during the add_base_rels_to_query() step, when build_simple_rel() builds a RelOptInfo for the root partitioned table, it also initializes some newly introduced fields in RelOptInfo from the information contained in PartitionRootInfo of the table. The aforementioned fields are only initialized in RelOptInfos of root partitioned tables. Note that the add_base_rels_to_query() step won't add the partition "otherrel" RelOptInfos yet (unlike the regular inheritance case, where they are, after looking them up in root->append_rel_list). When set_append_rel_size() is called on the root partitioned table, it will call a find_partitions_for_query(), which using the partition tree information, determines the partitions that will need to be scanned for the query. This processing happens recursively, that is, we first determine the root-parent's partitions and then for each partition that's partitioned, we will determine its partitions and so on. As we determine partitions in this per-partitioned-table manner, we maintain a pair (parent_relid, list-of-partition-relids-to-scan) for each partitioned table and also a single list of all leaf partitions determined so far. Once all partitions have been determined, we turn to locking the leaf partitions. The locking happens in the order of OIDs as find_all_inheritors would have returned in expand_inherited_rtentry(); the list of OIDs in that original order is also stored in the table's PartitionRootInfo node. For each OID in that list, check if that OID is in the set of leaf partition OIDs that was just computed, and if so, lock it. For all chosen partitions that are partitioned tables (including the root), we create a PartitionAppendInfo node which stores the aforementioned pair (parent_relid, list-of-partitions-relids-to-scan), and append it to a list in the root table's RelOptInfo, with the root table's PartitionAppendInfo at the head of the list. Note that the list of partitions in this pair contains only the immediate partitions, so that the original parent-child relationship is reflected in the list of PartitionAppendInfos thus collected. The next patch that will implement actual partition-pruning will add some more code that will run under find_partitions_for_query(). set_append_rel_size() processing then continues for the root partitioned table. It is at this point that we will create the RelOptInfos and AppendRelInfos for partitions. First for those of the root partitioned table and then for those of each partitioned table when set_append_rel_size() will be recursively called for the latter. Note that this is still largely a WIP patch and the implementation details might change per both the feedback here and the discussion at [1]. Thanks, Amit [1] https://www.postgresql.org/message-id/befd7ec9-8f4c-6928-d330-ab05dbf860bf%40lab.ntt.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
pgsql-hackers by date: