Re: Row pattern recognition - Mailing list pgsql-hackers

From Henson Choi
Subject Re: Row pattern recognition
Date
Msg-id CAAAe_zAUSSiEgNSiwQSRd2FT094rC3BoCJawqERDOv-fZ97rKw@mail.gmail.com
Whole thread Raw
In response to Re: Row pattern recognition  (Tatsuo Ishii <ishii@postgresql.org>)
Responses Re: Row pattern recognition
List pgsql-hackers
Hi Ishii-san

We need to check the *frame" end, not the partition end.

I think your patch relies on !window_gettupleslot() to check whether
the row exists.

        if (!window_gettupleslot(winobj, pos, slot))
                return false;   /* No row exists */

But the function only checks the row existence in the current partition:

 *      Fetch the pos'th tuple of the current partition into the slot,

Thus it is possible that window_gettupleslot() returns true but the
row is not in the current frame in case that the partition is divided
into some frames. You need to check the row existence in a frame. For
this purpose you can use row_is_in_frame().

I ran following query and got the result with v38 (which includes your
NFA patches).

WITH data AS (
 SELECT * FROM (VALUES
  ('A', 1), ('A', 2),
  ('B', 3), ('B', 4)
  ) AS t(gid, id))
  SELECT gid, id, array_agg(id) OVER w
  FROM data
  WINDOW w AS (
   PARTITION BY gid
   ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING
   AFTER MATCH SKIP TO NEXT ROW
   PATTERN (A+)
   DEFINE A AS id < 10
  );
 gid | id | array_agg
-----+----+-----------
 A   |  1 | {1,2}
 A   |  2 |
 B   |  3 | {3,4}
 B   |  4 |
(4 rows)

I think the second and 4th rows are expected to return some data in
array_agg colum. In fact v37 patch returns following results for the
same query:

 gid | id | array_agg
-----+----+-----------
 A   |  1 | {1,2}
 A   |  2 | {2}
 B   |  3 | {3,4}
 B   |  4 | {4}
(4 rows)

While investigating the issue you mentioned, I found that the problem
is caused by the context absorption logic, not the frame boundary check.

The absorption logic removes a newer context when
older->matchEndRow >= newer->matchEndRow:

Example with pattern A+:
- Context at row 1: matchEndRow = 2 (match {1,2})
- Context at row 2: matchEndRow = 2 (match {2})
→ Newer context absorbed, row 2's result lost

With SKIP TO NEXT ROW, each row should produce its own independent match.
The absorption was incorrectly removing these valid overlapping matches.

I modified the absorption logic to only apply to SKIP PAST LAST ROW mode,
where it provides performance benefits without affecting correctness.

For SKIP TO NEXT ROW, absorption is now disabled, allowing each row
to produce independent overlapping matches as expected:

 gid | id | array_agg
-----+----+-----------
 A   |  1 | {1,2}
 A   |  2 | {2}
 B   |  3 | {3,4}
 B   |  4 | {4}
(4 rows)

I'm attaching 3 patches:

1. Test cases from Jacob Champion's branch
   - Added 355 lines of test cases and 51 lines of executor changes

2. Refactored update_reduced_frame for readability and performance
   - Extracted large code blocks into separate functions
   - Early return for already-processed positions
   - Integrated nfa_unlink_context into nfa_context_free
   - Used oldest-first context ordering for early termination

3. Enable context absorption only for SKIP PAST LAST ROW
   - Fixes overlapping match issue for SKIP TO NEXT ROW
   - In my 5000-row test, absorption showed significant performance gain for SKIP PAST LAST ROW
   - Added your test case to regression tests

Since I implemented it in the order of parser/planner/nfa/executor,
the executor flow became somewhat confusing.
I plan to continue improving it by reviewing and refactoring
from the top-level functions downward.

During refactoring, I plan to simplify pattern optimization and context
absorption logic, keeping only the parts that are clearly correct, even
if this means some performance loss in the initial version. We can add
optimizations back later once the core logic is stable and well-tested.

Best regards,
Henson
 
Attachment

pgsql-hackers by date:

Previous
From: Christoph Berg
Date:
Subject: Re: Proposal: SELECT * EXCLUDE (...) command
Next
From: "David G. Johnston"
Date:
Subject: Re: how to gate experimental features (SQL/PGQ)