Re: Ordered Append Node - Mailing list pgsql-hackers
From | Markus Schiltknecht |
---|---|
Subject | Re: Ordered Append Node |
Date | |
Msg-id | 4745DA70.9060209@bluegap.ch Whole thread Raw |
In response to | Ordered Append Node (Gregory Stark <stark@enterprisedb.com>) |
Responses |
Re: Ordered Append Node
|
List | pgsql-hackers |
Hello Gregory, Gregory Stark wrote: > I've been hacking on the idea of an Append node which maintains the ordering > of its subtables merging their records in order. This is important for > partitioned tables since otherwise a lot of plans are not available such as > merge joins. Cool! Some time ago, I've been trying something very similar, but didn't get very far. I'd like to share my thoughts anyway. First of, I envisioned that append node to be applicable also to unordered reading from multiple partitions, simply interleaving between the partitions. > The logic I've followed is to do as follows: > > 1) Go through all subrels asking for any interesting pathkey lists. Gather up > the union of all of these. I also tried to modify the Append node first, then figured that it might be better to base on the merge join node instead. While this seems farther away, I had the hope that a binary tree of such 'plain merge' nodes would require less comparisons in total. Plus, I found it a lot simpler to cope with exactly two input relations instead of n, as with the append node. :-) > 2) Go back through all the subrels and for each accumulated pathkey list ask > for the best path for that subrel for that pathkey list. > > 3) Generate an two append paths for each of these pathkey lists, one using the > best startup_cost subpath and one with the best total_cost subpath > available for the pathkey list for each child rel. If there is none > available take the best unordered path and put a sort node above it. > > 4) Additionally advertise the traditional unordered append node which our > parent could choose to put a sort node above same as ever. > > 5) Append plan nodes look a lot like Sort plan nodes glued onto an Append plan > node, with sort function oids and so on. > > 6) Append exec state nodes look a lot like a (a few fields from) > tuplesortstate glued onto an Append node. They have the ScanKey array and > the fmgr sort functions. They also have an array of TupleTableSlot and a > heap of indexes into that array. > > 8) ExecAppend does something similar to tuplesort's bounded sort (I fear I'm > going to get typecasted) or more to the point, similar to the final merge > of a merge sort. It directly returns the slot the child rel last returned > to it. Uh.. well, yeah. I guess you have a better understanding of the planner and executor that I do. > Open questions: > > 1) I still haven't completely figured out what to do with equivalence classes. > My hack of just stuffing all the append subrel vars into there seems to > work fine. I need to understand what's going on to see if there's really a > problem with it or not. > > 2) I'm not sure this code will work when the append rel is a target (ie UPDATE > and DELETE stmts). > > 3) It does seem to work when the columns in the subrels don't line up but I > didn't do anything special to handle this case. Uh.. is there any use case for that? WRT partitioning certainly not, is there? I'll have a look at your patch. Regards Markus
pgsql-hackers by date: