Re: [PATCH] Negative Transition Aggregate Functions (WIP) - Mailing list pgsql-hackers
From | Dean Rasheed |
---|---|
Subject | Re: [PATCH] Negative Transition Aggregate Functions (WIP) |
Date | |
Msg-id | CAEZATCWhT9iUwGP-wdp5X6u7HV0HUafHfL68Dg=_9KAQTaOYEw@mail.gmail.com Whole thread Raw |
In response to | Re: [PATCH] Negative Transition Aggregate Functions (WIP) (Florian Pflug <fgp@phlo.org>) |
Responses |
Re: [PATCH] Negative Transition Aggregate Functions (WIP)
|
List | pgsql-hackers |
On 25 February 2014 12:33, Florian Pflug <fgp@phlo.org> wrote: > On Feb24, 2014, at 17:50 , Dean Rasheed <dean.a.rasheed@gmail.com> wrote: >> On 20 February 2014 01:48, Florian Pflug <fgp@phlo.org> wrote: >>> On Jan29, 2014, at 13:45 , Florian Pflug <fgp@phlo.org> wrote: >>>> In fact, I'm >>>> currently leaning towards just forbidding non-strict forward transition >>>> function with strict inverses, and adding non-NULL counters to the >>>> aggregates that then require them. It's really only the SUM() aggregates >>>> that are affected by this, I think. >>> >>> I finally got around to doing that, and the results aren't too bad. The >>> attached patches required that the strictness settings of the forward and >>> reverse transition functions agree, and employ exactly the same NULL-skipping >>> logic we always had. >>> >>> The only aggregates seriously affected by that change were SUM(int2) and >>> SUM(int4). >> >> I haven't looked at this in any detail yet, but that seems much neater >> to me. It seems perfectly sensible that the forward and inverse >> transition functions should have the same strictness settings, and >> enforcing that keeps the logic simple, as well as hopefully making it >> easier to document. > > Good to hear that you agree! I'll try to find some time to update the docs. > I finally got round to looking at this in more detail. Sorry for the delay. Here is my more detailed review of the base patch. Overall, I think that it is in reasonable shape, and as I said I think the approach of enforcing matching strictness settings on the forward and inverse transition functions is much simpler and neater. I have a few comments, some cosmetic, and a couple more substantive: * In a couple of places: errmsg("stricness of forward and reverse transition functions must match") - misspelling: "stricness". - "reverse" should be "inverse" to match the terminology used elsewhere. * Grammatical error in the comment for lookup_agg_function() - you should drop the word "both". * In show_windowagg_info(), this calculation looks suspicious to me: double tperrow = winaggstate->aggfwdtrans / (inst->nloops * inst->ntuples); If the node is executed multiple times, aggfwdtrans will be reset in each loop, so the transitions per row figure will be under-estimated. ISTM that if you want to report on this, you'd need aggfwdtrans to be reset once per query, but I'm not sure exactly how to do that. Here's a test case: explain (verbose, analyse) select sum(i) over (rows between 4 preceding and current row) from generate_series(1, 10) i; which outputs 10 rows with an average of 1 transition per row, but doing the same window aggregate twice in a nested loop: explain (verbose, analyse) select * from (values (10), (10)) v(x), lateral (select sum(i) over (rows between 4 precedingand current row) from generate_series(1, x) i) t; outputs 20 rows, but only reports 0.5 transitions per row. Actually, I think it's misleading to only count forward transition function calls, because a call to the inverse transition function still represents a state transition, and is likely to be around the same cost. For a window of size 2, there would not be much advantage to using inverse transition functions, because it would be around 2 transitions per row either way. * The function comment for build_aggregate_fnexprs() needs to be updated to reference the inverse transition function. I'd also be tempted to have it allow invtransfnexpr be a NULL pointer, if the inverse transition function expression tree is not required. Then ExecInitAgg() could simply pass NULL, instead of having the local variable invtransfnexpr with the slightly cryptic comment "needed but never used". * In struct WindowStatePerAggData, I think you should change the field order to transfn_oid, invtransfn_oid and then finalfn_oid. It's only a small thing, but that's the order those 3 functions are referred to everywhere else. * In struct WindowStatePerAggData, the comment for transValueCount should read "number of aggregated values". * If AggCheckCallContext() is called from a window function, and it asks for an aggcontext, it will fail because calledaggno will be -1. That can't currently happen for any of our built-in window functions, and I'm not sure if it's likely to happen in the future, but I think it would be better to defend against that possibility just in case. So I think it ought to return the shared context in that case, as the original code would have done. * In advance_windowaggregate(), this code if (peraggstate->transfn.fn_strict) { is against the project style, which is to have curly braces on new lines. But also, that test condition is the same as the preceding block, so the 2 blocks could just be merged. * I was wondering about the case of a forward transition function returning NULL in the presence of an inverse transition function. In this patch there are 3 pieces of code that test for that: 1). advance_windowaggregate() errors out if the forward transition function returns NULL and there is an inverse transition function. 2). an Assert in advance_windowaggregate() fires if a prior call made the state NULL and there is an inverse transition function (should be impossible due to the above error). 3). retreat_windowaggregate() errors out if it sees a NULL state (which ought to be impossible due to both of the above). I find the resulting error "transition function with an inverse returned NULL" surprising. Why shouldn't a transition function return NULL if it wants to? It can if it's used as a regular aggregate, so it seems somewhat odd that it can't if it's used in a window context, and it has an inverse. Would it not be simpler and more flexible to just allow the forward transition function to return NULL, and treat a NULL state as non-invertible, requiring a restart. So delete check (1) and just allow the forward transition function to return NULL, delete Assert (2) so that it propagates a NULL state to the end as it would do in the absence of an inverse transition function, and modify check (3) to return false forcing a restart if the state is NULL. So then if the forward transition function did return NULL, the inverse transition function would not actually be called, and it would compute the answer the hard way, rather than erroring out. * In retreat_windowaggregate(), the comment before the check on transValueCount is missing a "be". * I think the function comment for eval_windowaggregates() should be updated to mention that it may also use an inverse transition function to remove aggregated data from a transition value. * I found the guts of eval_windowaggregates() a little hard to follow, although I think the code is correct. It could perhaps use a little tidying up. Here are a few ideas: - Maybe numaggs_restart would be better called numtorestart. - is_first isn't adding much, and it would probably be easier to read if it were eliminated by inlining it in the one place it is used. - Similarly I would move the variable "ok" to the block that uses it. - The variable pos could be eliminated by having the retreat loop increment winstate->aggregatedbase in each pass, which would better match its comment which says it updates aggregatedbase, which it currently doesn't do. For consistency, perhaps this loop should be written more in the style of the advance loop. The loop should ensure that, on exit, aggregatedbase is equal to frameheadpos. - aggregatedupto_nonrestarted is a bit of a mouthful, but I can't immediately think of a better name. However, there's one comment that refers to it as aggregatedupto_previous. - The comment starting "After this, non-restarted aggregated..." is a bit confusing. What the code following it really seems to be doing is more along the lines of "If there are aggregates to restart, rewind aggregatedupto back to frameheadpos so that we can re-aggregate those values in the aggregates to be restarted...". * The pg_dump modifications don't look correct to me. I didn't test it, but I think you need to modify the pre-9.4 queries to return "-" for agginvtransfn otherwise I think it will be NULL and will seg-fault if you dump a pre-9.4 database containing custom aggregates. * The last regression test in aggregates.sql should test for unequal strictness values, rather than just strict forward transition functions with non-strict inverses. I think that's everything for the base patch. Regards, Dean
pgsql-hackers by date: