Re: Row pattern recognition - Mailing list pgsql-hackers
From | Tatsuo Ishii |
---|---|
Subject | Re: Row pattern recognition |
Date | |
Msg-id | 20230728.160953.1112305070052061571.t-ishii@sranhm.sra.co.jp Whole thread Raw |
In response to | Re: Row pattern recognition (Vik Fearing <vik@postgresfriends.org>) |
Responses |
Re: Row pattern recognition
|
List | pgsql-hackers |
>> We already recalculate a frame each time a row is processed even >> without RPR. See ExecWindowAgg. > > Yes, after each row. Not for each function. Ok, I understand now. Closer look at the code, I realized that each window function calls update_frameheadpos, which computes the frame head position. But actually it checks winstate->framehead_valid and if it's already true (probably by other window function), then it does nothing. >> Also RPR always requires a frame option ROWS BETWEEN CURRENT ROW, >> which means the frame head is changed each time current row position >> changes. > > Off topic for now: I wonder why this restriction is in place and > whether we should respect or ignore it. That is a discussion for > another time, though. My guess is, it is because other than ROWS BETWEEN CURRENT ROW has little or no meaning. Consider following example: SELECT i, first_value(i) OVER w FROM (VALUES (1), (2), (3), (4)) AS v (i) WINDOW w AS ( ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING AFTER MATCH SKIP PAST LAST ROW PATTERN (A) DEFINE A AS i = 1 OR i = 3 ); In this example ROWS BETWEEN CURRENT ROW gives frames with i = 1 and i = 3. i | first_value ---+------------- 1 | 1 2 | 3 | 3 4 | (4 rows) But what would happen with ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING? Probably the frame i = 3 will be missed as at i = 2, PATTERN is not satisfied and compution of the reduced frame stops. i | first_value ---+------------- 1 | 1 2 | 3 | 4 | (4 rows) This is not very useful for users. >>> I strongly disagree with this. Window function do not need to know >>> how the frame is defined, and indeed they should not. >> We already break the rule by defining *support functions. See >> windowfuncs.c. > The support functions don't know anything about the frame, they just > know when a window function is monotonically increasing and execution > can either stop or be "passed through". I see following code in window_row_number_support: /* * The frame options can always become "ROWS BETWEEN UNBOUNDED * PRECEDING AND CURRENT ROW". row_number() always just increments by * 1 with each row in the partition. Using ROWS instead of RANGE * saves effort checking peer rows during execution. */ req->frameOptions = (FRAMEOPTION_NONDEFAULT | FRAMEOPTION_ROWS | FRAMEOPTION_START_UNBOUNDED_PRECEDING | FRAMEOPTION_END_CURRENT_ROW); I think it not only knows about frame but it even changes the frame options. This seems far from "don't know anything about the frame", no? > I have two comments about this: > > It isn't just for convenience, it is for correctness. The window > functions do not need to know which rows they are *not* operating on. > > There is no such thing as a "full" or "reduced" frame. The standard > uses those terms to explain the difference between before and after > RPR is applied, but window functions do not get to choose which frame > they apply over. They only ever apply over the reduced window frame. I agree that "full window frame" and "reduced window frame" do not exist at the same time, and in the end (after computation of reduced frame), only "reduced" frame is visible to window functions/aggregates. But I still do think that "full window frame" and "reduced window frame" are important concept to explain/understand how PRP works. Best reagards, -- Tatsuo Ishii SRA OSS LLC English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
pgsql-hackers by date: