Re: On columnar storage - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: On columnar storage |
Date | |
Msg-id | 11572.1434302814@sss.pgh.pa.us Whole thread Raw |
In response to | Re: On columnar storage (Andres Freund <andres@anarazel.de>) |
Responses |
Re: On columnar storage
|
List | pgsql-hackers |
Andres Freund <andres@anarazel.de> writes: > On 2015-06-11 20:03:16 -0300, Alvaro Herrera wrote: >> Parsing occurs as currently. During query rewrite, specifically at the >> bottom of the per-relation loop in fireRIRrules(), we will modify the >> query tree: each relation RTE containing a colstore will be replaced >> with a JoinExpr containing the relation as left child and the colstore >> as right child (1). The colstore RTE will be of a new RTEKind. For >> each such change, all Var nodes that point to attnums stored in the >> colstore will modified so that they reference the RTE of the colstore >> instead (2). > FWIW, I think this is a pretty bad place to tackle this. For one I think > we shouldn't add more stuff using the rewriter unless it's clearly the > best interface. For another, doing things in the rewriter will make > optimizing things much harder - the planner will have to reconstruct > knowledge which of the joins are column store joins and such. As a comparison point, one of my Salesforce colleagues just put in a somewhat similar though single-purpose thing, to expand what originally is a simple table reference into a join (against a system catalog that's nowhere mentioned in the original query). In our case, we wanted to force a scan on a large table to have a constraint on the leading primary key column; if the query has such a constraint already, then fine, else create one by joining to a catalog that lists the allowed values of that column. We started out by trying to do it in the rewriter, and that didn't work well at all. We ended up actually doing it at createplan.c time, which is conceptually ugly, but there was no good place to do it earlier without duplicating a lot of indexqual analysis. But the thing that made that painful was that the transformation was optional, and indeed might happen or not happen for a given query depending on the selected plan shape. AFAICT the transformation Alvaro is proposing is unconditional, so it might be all right to do it in the rewriter. As you say, if the planner needs to reconstruct what happened, that would be a strike against this way, but it's not clear from here whether any additional info is needed beyond the already-suggested extra RTEKind. Another model that could be followed is expansion of inheritance-tree references, which happens early in the planner. In that case the planner does keep additional information about what it did (the appendrel data structures), so that could be a good model if this code needs to do likewise. The existing join-alias-var flattening logic in the planner might be of interest as well for the variable-substitution business, which I suspect is the main reason Alvaro is proposing doing it in the rewriter. regards, tom lane
pgsql-hackers by date: