Re: BUG #15324: Non-deterministic behaviour from parallelised sub-query - Mailing list pgsql-bugs
From | Amit Kapila |
---|---|
Subject | Re: BUG #15324: Non-deterministic behaviour from parallelised sub-query |
Date | |
Msg-id | CAA4eK1JqimRvJt5=nuukG+hXGA0P2tD=D0ewYrO4u0ig_TTacg@mail.gmail.com Whole thread Raw |
In response to | Re: BUG #15324: Non-deterministic behaviour from parallelisedsub-query (Stephen Frost <sfrost@snowman.net>) |
Responses |
Re: BUG #15324: Non-deterministic behaviour from parallelised sub-query
Re: BUG #15324: Non-deterministic behaviour from parallelised sub-query |
List | pgsql-bugs |
On Wed, Aug 15, 2018 at 4:40 PM, Stephen Frost <sfrost@snowman.net> wrote: > Greetings, > > * Amit Kapila (amit.kapila16@gmail.com) wrote: >> On Tue, Aug 14, 2018 at 9:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> > Marko Tiikkaja <marko@joh.to> writes: >> >> Marking the function parallel safe doesn't seem wrong to me. The >> >> non-parallel-safe part is that the input gets fed to it in different order >> >> in different workers. And I don't really think that to be the function's >> >> fault. >> > >> > So that basically opens the question of whether *any* window function >> > calculation can safely be pushed down to parallel workers. >> >> I think we can consider it as a parallel-restricted operation. For >> the purpose of testing, I have marked row_number as >> parallel-restricted in pg_proc and I get the below plan: >> >> postgres=# Explain select count(*) from qwr where (a, b) in (select a, >> row_number() over() from qwr); >> QUERY PLAN >> -------------------------------------------------------------------------------------------------------- >> Aggregate (cost=46522.12..46522.13 rows=1 width=8) >> -> Hash Semi Join (cost=24352.08..46362.12 rows=64001 width=0) >> Hash Cond: ((qwr.a = qwr_1.a) AND (qwr.b = (row_number() OVER (?)))) >> -> Gather (cost=0.00..18926.01 rows=128002 width=8) >> Workers Planned: 2 >> -> Parallel Seq Scan on qwr (cost=0.00..18926.01 >> rows=64001 width=8) >> -> Hash (cost=21806.06..21806.06 rows=128002 width=12) >> -> WindowAgg (cost=0.00..20526.04 rows=128002 width=12) >> -> Gather (cost=0.00..18926.01 rows=128002 width=4) >> Workers Planned: 2 >> -> Parallel Seq Scan on qwr qwr_1 >> (cost=0.00..18926.01 rows=64001 width=4) >> (11 rows) >> >> This seems okay, though the results of the above parallel-execution >> are not same as serial-execution. I think the reason for it is that >> we don't get rows in predictable order from workers. > > You wouldn't get them in a predictable order even without > parallelization due to the lack of an ordering, so this hardly seems > like an issue. > Right. >> > Somewhat like the LIMIT/OFFSET case, it seems to me that we could only >> > expect to do this safely if the row ordering induced by the WINDOW clause >> > can be proven to be fully deterministic. The planner has no such smarts >> > at the moment AFAIR. In principle you could do it if there were >> > partitioning/ordering by a primary key, but I'm not excited about the >> > prospects of that being true often enough in practice to justify making >> > the check. >> >> Yeah, I am also not sure if it is worth adding the additional checks. >> So, for now, we can treat any window function calculation as >> parallel-restricted and if later anybody has a reason strong enough to >> relax the restriction for some particular case, we will consider it. > > Seems likely that we'll want this at some point, but certainly seems > like new work and not a small bit of it. > Yeah, let me summarize the problems which require patches: (a) Consider the presence of a LIMIT/OFFSET in a sub-select as making it parallel-unsafe. (b) Consider the presence of any window function calculation as parallel-restricted operation. Initially, I will prepare two separate patches for them and then we will see if we want to combine them into one before committing. It might take me few days to come up with patches, so if anyone else wants to take a lead, feel free to do so. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
pgsql-bugs by date: