Re: [HACKERS] [PROPOSAL] Temporal query processing with range types - Mailing list pgsql-hackers
From | Peter Moser |
---|---|
Subject | Re: [HACKERS] [PROPOSAL] Temporal query processing with range types |
Date | |
Msg-id | 1511256990.15349.3.camel@gmail.com Whole thread Raw |
In response to | Re: [HACKERS] [PROPOSAL] Temporal query processing with range types (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: [HACKERS] [PROPOSAL] Temporal query processing with range types
Re: [HACKERS] [PROPOSAL] Temporal query processing with range types |
List | pgsql-hackers |
2017-11-14 18:42 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us>: > You might consider putting the rewriting into, um, the rewriter. > It could be a separate pass after view expansion, if direct integration > with the existing behavior seems unduly spaghetti-ish. Or do it in > an early phase of planning as he suggested. There's not really that > much difference between the rewriter and the planner for this purpose. > Although one way to draw the distinction is that the output of the > rewriter is (currently) still fully expressible as plain SQL, whereas > once the planner goes into action the intermediate states of the tree > might not really be SQL anymore (eg, it might contain join types that > don't correspond to any SQL syntax). So depending on what your rewrite > emits, there would be a weak preference for calling it part of the > rewriter or planner respectively. 2017-11-16 16:42 GMT+01:00 Robert Haas <robertmhaas@gmail.com>: > Another thing to think about is that even though the CURRENT > implementation just rewrites the relevant constructs as SQL, in the > future somebody might want to do something else. I feel like it's not > hard to imagine a purpose-build ALIGN or NORMALIZE join type being a > lot faster than the version that's just done by rewriting the SQL. > That would be more work, potentially, but it would be nice if the > initial implementation leant itself to be extended that way in the > future, which an all-rewriter implementation would not. On the other > hand, maybe an early-in-the-optimizer implementation wouldn't either, > and maybe it's not worth worrying about it anyway. But it would be > cool if this worked out in a way that meant it could be further > improved without having to change it completely. Hi hackers, we like to rethink our approach... For simplicity I'll drop ALIGN for the moment and focus solely on NORMALIZE: SELECT * FROM (R NORMALIZE S ON R.x = S.y WITH (R.time, S.time)) c; Our normalization executor node needs the following input (for now expressed in plain SQL): SELECT R.*, p1 FROM (SELECT *, row_id() OVER () rn FROM R) R LEFT OUTER JOIN ( SELECT y, LOWER(time) p1 FROM S UNION SELECTy, UPPER(time) p1 FROM S ) S ON R.x = S.y AND p1 <@ R.time ORDER BY rn, p1; In other words: 1) The left subquery adds an unique ID to each tuple (i.e., rn). 2) The right subquery creates two results for each input tuple: one for the upper and one for the lower bound of each input tuple's valid time column. The boundaries get put into a single (scalar) column, namely p1. 3) We join both subqueries if the normalization predicates hold (R.x = S.y) and p1 is inside the time of the current outer tuple. 4) Finally, we sort the result by the unique ID (rn) and p1, and give all columns of the outer relation, rn and p1 back. Our first attempt to understand the new approach would be as follows: The left base rel of the inner left-outer-join can be expressed as a WindowAgg node. However, the right query of the join is much more difficult to build (maybe through hash aggregates). Both queries could be put together with a MergeJoin for instance. However, if we create the plan tree by hand and choose algorithms for it manually, how is it possible to have it optimized later? Or, if that is not possible, how do we choose the best algorithms for it? Best regards, Anton, Johann, Michael, Peter
pgsql-hackers by date: