Re: Implementation of GROUPING SETS (T431: Extended grouping capabilities) - Mailing list pgsql-hackers
From | Pavel Stehule |
---|---|
Subject | Re: Implementation of GROUPING SETS (T431: Extended grouping capabilities) |
Date | |
Msg-id | 162867790908130959n9e4da70ubb47b7003f936926@mail.gmail.com Whole thread Raw |
In response to | Re: Implementation of GROUPING SETS (T431: Extended grouping capabilities) (Hitoshi Harada <umi.tanuki@gmail.com>) |
Responses |
Re: Implementation of GROUPING SETS (T431: Extended
grouping capabilities)
|
List | pgsql-hackers |
2009/8/13 Hitoshi Harada <umi.tanuki@gmail.com>: > 2009/8/8 Alvaro Herrera <alvherre@commandprompt.com>: >> Олег Царев escribió: >>> Hello all! >>> If no one objecte (all agree, in other say) i continue work on patch - >>> particulary, i want support second strategy (tuple store instead of >>> hash-table) for save order of source (more cheap solution in case with >>> grouping sets + order by), investigate and brainstorm another >>> optimisation, writing regression tests and technical documentation. >>> But I need some time for complete my investigation internals of >>> PostgreSQL, particulary CTE. >> >> Where are we on this patch? Is it moving forward? >> > > It seems to me that the patch goes backward. little bit. > > I looked trough the gsets-0.6.diff for about an hour, and found it is > now only a syntax sugar that builds multiple GROUP BY queries based on > CTE functionality. There's no executor modification. > I wrote older version in time when CTE wasn't implemented. The old patch had own executor node, and was based on creating hash table per group. This patch was maybe little bit faster than 0.6, but had lot of bugs. Who knows planner code for grouping, then have to agree with me. This code isn't readable and I wouldn't to it more complicated and less readable. So I had idea, to join grouping sets with CTE. Grouping sets is subset of CTE, so it is possible. Grouping sets is non recursive CTE generally, and (I believe) this should be optimized together. I prefered using CTE, because this way was the most short to small bugs less prototype - with full functionality. > If I remember correctly, the original patch touched executor parts. > I'd buy if the GROUPING SETS touches executor but I don't if this is > only syntax sugar, because you can write it as the same by yourself > without GROUPING SETS syntax. The motivation we push this forward is > performance that cannot be made by rewriting query, I guess. > I don't thing, so you can do simply transformation from grouping sets syntax to CTE. And what's more. Why you have optimized grouping sets and not optimized non recursive CTE? > Because GROUP BY we have today is a subset of GROUPING SETS by > definition, I suppose we'll refactor nodeAgg.c so that it is allowed > to take multiple group definitions. And we must support both of > HashAgg and GroupAgg. For HashAgg, it is easier in any case as the > earlier patch does. For GroupAgg, it is a bit complicated since we > sort by different key sets. > This way is possible too. But needs absolutely grouping planner and executor reworking. Maybe is time do it. It is work for somebody other to me. My place is stored procedures. > When we want GROUPING SET(a, b), at first we sort by a and aggregate > then sort by b and aggregate. This is the same as: > > select a, null, count(*) from x group by a > union all > select null, b, count(*) from x group by b > > so nothing better than query rewriting unless we invent something new. > the problem is when x is subquery. Then is better using CTE, because we don't need repeat x evaluation twice. The most typical use case is, so x isn't table. > But in case of sub total and grand total like ROLLUP query, GroupAgg > can do it by one-time scan by having multiple life cycle PerGroup > state. > > Anyway, before going ahead we need to find rough sketch of how to > implement this feature. Only syntax sugar is acceptable? Or internal > executor support is necessary? I thing, so both ways are possible. Probably the most clean way is total refactoring of grouping executor and grouping planner. I am not sure if we need new nodes. There are all. But these nodes cannot work paralel now. > > > Regards, > > > -- > Hitoshi Harada >
pgsql-hackers by date: