Re: Row pattern recognition - Mailing list pgsql-hackers

From Tatsuo Ishii
Subject Re: Row pattern recognition
Date
Msg-id 20260115.210457.19846684278230391.ishii@postgresql.org
Whole thread Raw
In response to Re: Row pattern recognition  (Henson Choi <assam258@gmail.com>)
Responses Re: Row pattern recognition
List pgsql-hackers
Hi Henson,

>> So my question is, how do you ensure that ctxFrameEnd does not go
>> beyond the full window frame end?
>>
> 
> In update_reduced_frame():
> 
>     frameOffset = endOffsetValue;     // e.g., 2 from "2 FOLLOWING"
> 
>     for each row (currentPos):
> 
>         if (!rowExists)               // partition end reached
>             finalize all contexts;
>             break;
> 
>         for each context:
>             ctxFrameEnd = matchStartRow + frameOffset + 1;
>             if (currentPos >= ctxFrameEnd)
>                 finalize this context;
>                 continue;
> 
> Even if ctxFrameEnd exceeds partition end, the "if (!rowExists)" check
> fires first and finalizes all contexts at the actual partition boundary.

We need to check the *frame" end, not the partition end.

I think your patch relies on !window_gettupleslot() to check whether
the row exists.

    if (!window_gettupleslot(winobj, pos, slot))
        return false;    /* No row exists */

But the function only checks the row existence in the current partition:

 *    Fetch the pos'th tuple of the current partition into the slot,

Thus it is possible that window_gettupleslot() returns true but the
row is not in the current frame in case that the partition is divided
into some frames. You need to check the row existence in a frame. For
this purpose you can use row_is_in_frame().

I ran following query and got the result with v38 (which includes your
NFA patches).

WITH data AS (
 SELECT * FROM (VALUES
  ('A', 1), ('A', 2),
  ('B', 3), ('B', 4)
  ) AS t(gid, id))
  SELECT gid, id, array_agg(id) OVER w
  FROM data
  WINDOW w AS (
   PARTITION BY gid
   ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING
   AFTER MATCH SKIP TO NEXT ROW
   PATTERN (A+)
   DEFINE A AS id < 10
  );
 gid | id | array_agg 
-----+----+-----------
 A   |  1 | {1,2}
 A   |  2 | 
 B   |  3 | {3,4}
 B   |  4 | 
(4 rows)

I think the second and 4th rows are expected to return some data in
array_agg colum. In fact v37 patch returns following results for the
same query:

 gid | id | array_agg 
-----+----+-----------
 A   |  1 | {1,2}
 A   |  2 | {2}
 B   |  3 | {3,4}
 B   |  4 | {4}
(4 rows)

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp



pgsql-hackers by date:

Previous
From: Shinya Kato
Date:
Subject: Re: file_fdw: Support multi-line HEADER option.
Next
From: Bertrand Drouvot
Date:
Subject: Re: Flush some statistics within running transactions