Thread: crashes due to setting max_parallel_workers=0
Hi, while working on a patch I ran into some crashes that seem to be caused by inconsistent handling of max_parallel_workers - queries still seem to be planned with parallel plans enabled, but then crash at execution. The attached script reproduces the issue on a simple query, causing crashed in GatherMerge, but I assume the issue is more general. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Sat, Mar 25, 2017 at 7:40 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > Hi, > > while working on a patch I ran into some crashes that seem to be caused by > inconsistent handling of max_parallel_workers - queries still seem to be > planned with parallel plans enabled, but then crash at execution. > > The attached script reproduces the issue on a simple query, causing crashed > in GatherMerge, but I assume the issue is more general. > I could see other parallel plans like Gather also picked at max_parallel_workers=0. I suspect multiple issues here and this needs some investigation. I have added this to open items list. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On 25 March 2017 at 13:10, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > while working on a patch I ran into some crashes that seem to be caused by > inconsistent handling of max_parallel_workers - queries still seem to be > planned with parallel plans enabled, but then crash at execution. > > The attached script reproduces the issue on a simple query, causing crashed > in GatherMerge, but I assume the issue is more general. I had a look at this and found a bunch of off by 1 errors in the code. I've attached a couple of patches, one is the minimal fix, and one more complete one. In the more complete one I ended up ditching the GatherMergeState->nreaders altogether. It was always the same as nworkers_launched anyway, so I saw no point in it. Here I've attempted to make the code a bit more understandable, to prevent further confusion about how many elements are in each array. Probably there's more that can be done here. I see GatherState has nreaders too, but I've not looked into the detail of if it suffers from the same weirdness of nreaders always matching nworkers_launched. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
Thanks Tomas for reporting issue and Thanks David for working
--
on this.
I can see the problem in GatherMerge, specially when nworkers_launched
is zero. I will look into this issue and will post a fix for the same.
Also another point which I think we should fix is, when someone set
max_parallel_workers = 0, we should also set the max_parallel_workers_per_gather
to zero. So that way it we can avoid generating the gather path with
max_parallel_worker = 0.
Thanks,
On Sat, Mar 25, 2017 at 2:21 PM, David Rowley <david.rowley@2ndquadrant.com> wrote:
On 25 March 2017 at 13:10, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
> while working on a patch I ran into some crashes that seem to be caused by
> inconsistent handling of max_parallel_workers - queries still seem to be
> planned with parallel plans enabled, but then crash at execution.
>
> The attached script reproduces the issue on a simple query, causing crashed
> in GatherMerge, but I assume the issue is more general.
I had a look at this and found a bunch of off by 1 errors in the code.
I've attached a couple of patches, one is the minimal fix, and one
more complete one.
In the more complete one I ended up ditching the
GatherMergeState->nreaders altogether. It was always the same as
nworkers_launched anyway, so I saw no point in it.
Here I've attempted to make the code a bit more understandable, to
prevent further confusion about how many elements are in each array.
Probably there's more that can be done here. I see GatherState has
nreaders too, but I've not looked into the detail of if it suffers
from the same weirdness of nreaders always matching nworkers_launched.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Rushabh Lathia
On 25 March 2017 at 23:09, Rushabh Lathia <rushabh.lathia@gmail.com> wrote: > Also another point which I think we should fix is, when someone set > max_parallel_workers = 0, we should also set the > max_parallel_workers_per_gather > to zero. So that way it we can avoid generating the gather path with > max_parallel_worker = 0. I see that it was actually quite useful that it works the way it does. If it had worked the same as max_parallel_workers_per_gather, then likely Tomas would never have found this bug. I wondered if there's anything we can do here to better test cases when no workers are able to try to ensure the parallel nodes work correctly, but the more I think about it, the more I don't see wrong with just using SET max_parallel_workers = 0; My vote would be to leave the GUC behaviour as is and add some tests to select_parallel.sql which exploit setting max_parallel_workers to 0 for running some tests. If that's not going to fly, then unless we add something else to allow us to reliably not get any workers, then we're closing to close the door on being able to write automatic tests to capture this sort of bug. ... thinks for a bit.... perhaps some magic value like -1 could be used for this... unsure of how that would be documented though... -- David Rowley http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Sat, Mar 25, 2017 at 6:31 PM, David Rowley <david.rowley@2ndquadrant.com> wrote: > On 25 March 2017 at 23:09, Rushabh Lathia <rushabh.lathia@gmail.com> wrote: >> Also another point which I think we should fix is, when someone set >> max_parallel_workers = 0, we should also set the >> max_parallel_workers_per_gather >> to zero. So that way it we can avoid generating the gather path with >> max_parallel_worker = 0. > > I see that it was actually quite useful that it works the way it does. > If it had worked the same as max_parallel_workers_per_gather, then > likely Tomas would never have found this bug. > > I wondered if there's anything we can do here to better test cases > when no workers are able to try to ensure the parallel nodes work > correctly, but the more I think about it, the more I don't see wrong > with just using SET max_parallel_workers = 0; > > My vote would be to leave the GUC behaviour as is and add some tests > to select_parallel.sql which exploit setting max_parallel_workers to 0 > for running some tests. > I think force_parallel_mode=regress should test the same thing. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On 26 March 2017 at 00:17, Amit Kapila <amit.kapila16@gmail.com> wrote: > On Sat, Mar 25, 2017 at 6:31 PM, David Rowley > <david.rowley@2ndquadrant.com> wrote: >> On 25 March 2017 at 23:09, Rushabh Lathia <rushabh.lathia@gmail.com> wrote: >>> Also another point which I think we should fix is, when someone set >>> max_parallel_workers = 0, we should also set the >>> max_parallel_workers_per_gather >>> to zero. So that way it we can avoid generating the gather path with >>> max_parallel_worker = 0. >> >> I see that it was actually quite useful that it works the way it does. >> If it had worked the same as max_parallel_workers_per_gather, then >> likely Tomas would never have found this bug. >> >> I wondered if there's anything we can do here to better test cases >> when no workers are able to try to ensure the parallel nodes work >> correctly, but the more I think about it, the more I don't see wrong >> with just using SET max_parallel_workers = 0; >> >> My vote would be to leave the GUC behaviour as is and add some tests >> to select_parallel.sql which exploit setting max_parallel_workers to 0 >> for running some tests. >> > > I think force_parallel_mode=regress should test the same thing. Not really. That flag's days are surely numbered. It creates a Gather node, the problem was with GatherMerge. -- David Rowley http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 3/25/17 09:01, David Rowley wrote: > On 25 March 2017 at 23:09, Rushabh Lathia <rushabh.lathia@gmail.com> wrote: >> Also another point which I think we should fix is, when someone set >> max_parallel_workers = 0, we should also set the >> max_parallel_workers_per_gather >> to zero. So that way it we can avoid generating the gather path with >> max_parallel_worker = 0. > I see that it was actually quite useful that it works the way it does. > If it had worked the same as max_parallel_workers_per_gather, then > likely Tomas would never have found this bug. Another problem is that the GUC system doesn't really support cases where the validity of one setting depends on the current value of another setting. So each individual setting needs to be robust against cases of related settings being nonsensical. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, Mar 25, 2017 at 7:01 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
On 3/25/17 09:01, David Rowley wrote:
> On 25 March 2017 at 23:09, Rushabh Lathia <rushabh.lathia@gmail.com> wrote:
>> Also another point which I think we should fix is, when someone set
>> max_parallel_workers = 0, we should also set the
>> max_parallel_workers_per_gather Another problem is that the GUC system doesn't really support cases
>> to zero. So that way it we can avoid generating the gather path with
>> max_parallel_worker = 0.
> I see that it was actually quite useful that it works the way it does.
> If it had worked the same as max_parallel_workers_per_gather, then
> likely Tomas would never have found this bug.
where the validity of one setting depends on the current value of
another setting. So each individual setting needs to be robust against
cases of related settings being nonsensical.
Okay.
About the original issue reported by Tomas, I did more debugging and
found that - problem was gather_merge_clear_slots() was not returning
the clear slot when nreader is zero (means nworkers_launched = 0).
Due to the same scan was continue even all the tuple are exhausted,
and then end up with server crash at gather_merge_getnext(). In the patch
I also added the Assert into gather_merge_getnext(), about the index
should be less then the nreaders + 1 (leader).
should be less then the nreaders + 1 (leader).
PFA simple patch to fix the problem.
Thanks,
Rushabh Lathia
Attachment
On 03/25/2017 05:18 PM, Rushabh Lathia wrote: > > > On Sat, Mar 25, 2017 at 7:01 PM, Peter Eisentraut > <peter.eisentraut@2ndquadrant.com > <mailto:peter.eisentraut@2ndquadrant.com>> wrote: > > On 3/25/17 09:01, David Rowley wrote: > > On 25 March 2017 at 23:09, Rushabh Lathia <rushabh.lathia@gmail.com <mailto:rushabh.lathia@gmail.com>> wrote: > >> Also another point which I think we should fix is, when someone set > >> max_parallel_workers = 0, we should also set the > >> max_parallel_workers_per_gather > >> to zero. So that way it we can avoid generating the gather path with > >> max_parallel_worker = 0. > > I see that it was actually quite useful that it works the way it does. > > If it had worked the same as max_parallel_workers_per_gather, then > > likely Tomas would never have found this bug. > > Another problem is that the GUC system doesn't really support cases > where the validity of one setting depends on the current value of > another setting. So each individual setting needs to be robust against > cases of related settings being nonsensical. > > > Okay. > > About the original issue reported by Tomas, I did more debugging and > found that - problem was gather_merge_clear_slots() was not returning > the clear slot when nreader is zero (means nworkers_launched = 0). > Due to the same scan was continue even all the tuple are exhausted, > and then end up with server crash at gather_merge_getnext(). In the patch > I also added the Assert into gather_merge_getnext(), about the index > should be less then the nreaders + 1 (leader). > > PFA simple patch to fix the problem. > I think there are two issues at play, here - the first one is that we still produce parallel plans even with max_parallel_workers=0, and the second one is the crash in GatherMerge when nworkers=0. Your patch fixes the latter (thanks for looking into it), which is obviously a good thing - getting 0 workers on a busy system is quite possible, because all the parallel workers can be already chewing on some other query. But it seems a bit futile to produce the parallel plan in the first place, because with max_parallel_workers=0 we can't possibly get any parallel workers ever. I wonder why compute_parallel_worker() only looks at max_parallel_workers_per_gather, i.e. why shouldn't it do: parallel_workers = Min(parallel_workers, max_parallel_workers); Perhaps this was discussed and is actually intentional, though. Regarding handling this at the GUC level - I agree with Peter that that's not a good idea. I suppose we could deal with checking the values in the GUC check/assign hooks, but what we don't have is a way to undo the changes in all the GUCs. That is, if I do SET max_parallel_workers = 0; SET max_parallel_workers = 16; I expect to end up with just max_parallel_workers GUC changed and nothing else. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 03/25/2017 02:01 PM, David Rowley wrote:> > I wondered if there's anything we can do here to better test cases > when no workers are able to try to ensure the parallel nodes work > correctly, but the more I think about it, the more I don't see wrong > with just using SET max_parallel_workers = 0; > It's demonstrably a valid way to disable parallel queries (i.e. there's nothing wrong with it), because the docs say this: Setting this value to 0 disables parallel query execution. > > My vote would be to leave the GUC behaviour as is and add some tests > to select_parallel.sql which exploit setting max_parallel_workers to 0 > for running some tests. > > If that's not going to fly, then unless we add something else to allow > us to reliably not get any workers, then we're closing to close the > door on being able to write automatic tests to capture this sort of > bug. > > ... thinks for a bit.... > > perhaps some magic value like -1 could be used for this... unsure of > how that would be documented though... > I agree it'd be very useful to have a more where we generate parallel plans but then prohibit starting any workers. That would detect this and similar issues, I think. I'm not sure we need to invent a new magic value, though. Can we simply look at force_parallel_mode, and if it's 'regress' then tread 0 differently? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 27 March 2017 at 10:23, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > I'm not sure we need to invent a new magic value, though. Can we simply look > at force_parallel_mode, and if it's 'regress' then tread 0 differently? see standard_planner() if (force_parallel_mode != FORCE_PARALLEL_OFF && best_path->parallel_safe) { Gather *gather = makeNode(Gather); Probably force_parallel_mode is good for testing the tuple queue code, and some code in Gather, but I'm not quite sure what else its good for. Certainly not GatherMerge. -- David Rowley http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Mar 27, 2017 at 3:43 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
On 03/25/2017 05:18 PM, Rushabh Lathia wrote:
On Sat, Mar 25, 2017 at 7:01 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com
<mailto:peter.eisentraut@2ndquadrant.com >> wrote:
On 3/25/17 09:01, David Rowley wrote:
> On 25 March 2017 at 23:09, Rushabh Lathia <rushabh.lathia@gmail.com <mailto:rushabh.lathia@gmail.com >> wrote:
>> Also another point which I think we should fix is, when someone set
>> max_parallel_workers = 0, we should also set the
>> max_parallel_workers_per_gather
>> to zero. So that way it we can avoid generating the gather path with
>> max_parallel_worker = 0.
> I see that it was actually quite useful that it works the way it does.
> If it had worked the same as max_parallel_workers_per_gather, then
> likely Tomas would never have found this bug.
Another problem is that the GUC system doesn't really support cases
where the validity of one setting depends on the current value of
another setting. So each individual setting needs to be robust against
cases of related settings being nonsensical.
Okay.
About the original issue reported by Tomas, I did more debugging and
found that - problem was gather_merge_clear_slots() was not returning
the clear slot when nreader is zero (means nworkers_launched = 0).
Due to the same scan was continue even all the tuple are exhausted,
and then end up with server crash at gather_merge_getnext(). In the patch
I also added the Assert into gather_merge_getnext(), about the index
should be less then the nreaders + 1 (leader).
PFA simple patch to fix the problem.
I think there are two issues at play, here - the first one is that we still produce parallel plans even with max_parallel_workers=0, and the second one is the crash in GatherMerge when nworkers=0.
Your patch fixes the latter (thanks for looking into it), which is obviously a good thing - getting 0 workers on a busy system is quite possible, because all the parallel workers can be already chewing on some other query.
Thanks.
But it seems a bit futile to produce the parallel plan in the first place, because with max_parallel_workers=0 we can't possibly get any parallel workers ever. I wonder why compute_parallel_worker() only looks at max_parallel_workers_per_gather, i.e. why shouldn't it do:
parallel_workers = Min(parallel_workers, max_parallel_workers);
I agree with you here. Producing the parallel plan when max_parallel_workers = 0 is wrong. But rather then your suggested fix, I think that we should do something like:
/*
* In no case use more than max_parallel_workers_per_gather or
* max_parallel_workers.
*/
parallel_workers = Min(parallel_workers, Min(max_parallel_workers, max_parallel_workers_per_gather));
/*
* In no case use more than max_parallel_workers_per_gather or
* max_parallel_workers.
*/
parallel_workers = Min(parallel_workers, Min(max_parallel_workers, max_parallel_workers_per_gather));
Perhaps this was discussed and is actually intentional, though.
Yes, I am not quite sure about this.
Regarding handling this at the GUC level - I agree with Peter that that's not a good idea. I suppose we could deal with checking the values in the GUC check/assign hooks, but what we don't have is a way to undo the changes in all the GUCs. That is, if I do
SET max_parallel_workers = 0;
SET max_parallel_workers = 16;
I expect to end up with just max_parallel_workers GUC changed and nothing else.
regards
--
Tomas Vondra http://www.2ndQuadrant.comPostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Rushabh Lathia
On Mon, Mar 27, 2017 at 10:59 AM, Rushabh Lathia <rushabh.lathia@gmail.com> wrote:
On Mon, Mar 27, 2017 at 3:43 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:On 03/25/2017 05:18 PM, Rushabh Lathia wrote:
On Sat, Mar 25, 2017 at 7:01 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com
<mailto:peter.eisentraut@2ndquadrant.com >> wrote:
On 3/25/17 09:01, David Rowley wrote:
> On 25 March 2017 at 23:09, Rushabh Lathia <rushabh.lathia@gmail.com <mailto:rushabh.lathia@gmail.com >> wrote:
>> Also another point which I think we should fix is, when someone set
>> max_parallel_workers = 0, we should also set the
>> max_parallel_workers_per_gather
>> to zero. So that way it we can avoid generating the gather path with
>> max_parallel_worker = 0.
> I see that it was actually quite useful that it works the way it does.
> If it had worked the same as max_parallel_workers_per_gather, then
> likely Tomas would never have found this bug.
Another problem is that the GUC system doesn't really support cases
where the validity of one setting depends on the current value of
another setting. So each individual setting needs to be robust against
cases of related settings being nonsensical.
Okay.
About the original issue reported by Tomas, I did more debugging and
found that - problem was gather_merge_clear_slots() was not returning
the clear slot when nreader is zero (means nworkers_launched = 0).
Due to the same scan was continue even all the tuple are exhausted,
and then end up with server crash at gather_merge_getnext(). In the patch
I also added the Assert into gather_merge_getnext(), about the index
should be less then the nreaders + 1 (leader).
PFA simple patch to fix the problem.
I think there are two issues at play, here - the first one is that we still produce parallel plans even with max_parallel_workers=0, and the second one is the crash in GatherMerge when nworkers=0.
Your patch fixes the latter (thanks for looking into it), which is obviously a good thing - getting 0 workers on a busy system is quite possible, because all the parallel workers can be already chewing on some other query.Thanks.
I was doing more testing with the patch and I found one more server
crash with the patch around same area, when we forced the gather
merge for the scan having zero rows.
create table dept ( deptno numeric, dname varchar(20);
set parallel_tuple_cost =0;
set parallel_setup_cost =0;
set min_parallel_table_scan_size =0;
set min_parallel_index_scan_size =0;
set force_parallel_mode=regress;
explain analyze select * from dept order by deptno;
This is because for leader we don't initialize the slot into gm_slots. So
in case where launched worker is zero and table having zero rows, we
end up having NULL slot into gm_slots array.
Currently gather_merge_clear_slots() clear out the tuple table slots for each
gather merge input and returns clear slot. In the patch I modified function
gather_merge_clear_slots() to just clear out the tuple table slots and
always return NULL when All the queues and heap us exhausted.
merge for the scan having zero rows.
create table dept ( deptno numeric, dname varchar(20);
set parallel_tuple_cost =0;
set parallel_setup_cost =0;
set min_parallel_table_scan_size =0;
set min_parallel_index_scan_size =0;
set force_parallel_mode=regress;
explain analyze select * from dept order by deptno;
This is because for leader we don't initialize the slot into gm_slots. So
in case where launched worker is zero and table having zero rows, we
end up having NULL slot into gm_slots array.
Currently gather_merge_clear_slots() clear out the tuple table slots for each
gather merge input and returns clear slot. In the patch I modified function
gather_merge_clear_slots() to just clear out the tuple table slots and
always return NULL when All the queues and heap us exhausted.
But it seems a bit futile to produce the parallel plan in the first place, because with max_parallel_workers=0 we can't possibly get any parallel workers ever. I wonder why compute_parallel_worker() only looks at max_parallel_workers_per_gather, i.e. why shouldn't it do:
parallel_workers = Min(parallel_workers, max_parallel_workers);I agree with you here. Producing the parallel plan when max_parallel_workers = 0 is wrong. But rather then your suggested fix, I think that we should do something like:
/*
* In no case use more than max_parallel_workers_per_gather or
* max_parallel_workers.
*/
parallel_workers = Min(parallel_workers, Min(max_parallel_workers, max_parallel_workers_per_gather));
Perhaps this was discussed and is actually intentional, though.Yes, I am not quite sure about this.Regarding handling this at the GUC level - I agree with Peter that that's not a good idea. I suppose we could deal with checking the values in the GUC check/assign hooks, but what we don't have is a way to undo the changes in all the GUCs. That is, if I do
SET max_parallel_workers = 0;
SET max_parallel_workers = 16;
I expect to end up with just max_parallel_workers GUC changed and nothing else.
regards
--
Tomas Vondra http://www.2ndQuadrant.comPostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--Rushabh Lathia
Regards,
Rushabh Lathia
Attachment
On Mon, Mar 27, 2017 at 1:29 AM, Rushabh Lathia <rushabh.lathia@gmail.com> wrote: > >> But it seems a bit futile to produce the parallel plan in the first place, >> because with max_parallel_workers=0 we can't possibly get any parallel >> workers ever. I wonder why compute_parallel_worker() only looks at >> max_parallel_workers_per_gather, i.e. why shouldn't it do: >> >> parallel_workers = Min(parallel_workers, max_parallel_workers); >> >> Perhaps this was discussed and is actually intentional, though. >> > > Yes, I am not quite sure about this. It was intentional. See the last paragraph of https://www.postgresql.org/message-id/CA%2BTgmoaMSn6a1780VutfsarCu0LCr%3DCO2yi4vLUo-JQbn4YuRA@mail.gmail.com -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Mar 27, 2017 at 1:29 AM, Rushabh Lathia > <rushabh.lathia@gmail.com> wrote: >> But it seems a bit futile to produce the parallel plan in the first place, >> because with max_parallel_workers=0 we can't possibly get any parallel >> workers ever. I wonder why compute_parallel_worker() only looks at >> max_parallel_workers_per_gather, i.e. why shouldn't it do: >> parallel_workers = Min(parallel_workers, max_parallel_workers); >> Perhaps this was discussed and is actually intentional, though. > It was intentional. See the last paragraph of > https://www.postgresql.org/message-id/CA%2BTgmoaMSn6a1780VutfsarCu0LCr%3DCO2yi4vLUo-JQbn4YuRA@mail.gmail.com Since this has now come up twice, I suggest adding a comment there that explains why we're intentionally ignoring max_parallel_workers. regards, tom lane
On Mon, Mar 27, 2017 at 9:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Mon, Mar 27, 2017 at 1:29 AM, Rushabh Lathia >> <rushabh.lathia@gmail.com> wrote: >>> But it seems a bit futile to produce the parallel plan in the first place, >>> because with max_parallel_workers=0 we can't possibly get any parallel >>> workers ever. I wonder why compute_parallel_worker() only looks at >>> max_parallel_workers_per_gather, i.e. why shouldn't it do: >>> parallel_workers = Min(parallel_workers, max_parallel_workers); >>> Perhaps this was discussed and is actually intentional, though. > >> It was intentional. See the last paragraph of >> https://www.postgresql.org/message-id/CA%2BTgmoaMSn6a1780VutfsarCu0LCr%3DCO2yi4vLUo-JQbn4YuRA@mail.gmail.com > > Since this has now come up twice, I suggest adding a comment there > that explains why we're intentionally ignoring max_parallel_workers. Hey, imagine if the comments explained the logic behind the code! Good idea. How about the attached? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On Sat, Mar 25, 2017 at 12:18 PM, Rushabh Lathia <rushabh.lathia@gmail.com> wrote: > About the original issue reported by Tomas, I did more debugging and > found that - problem was gather_merge_clear_slots() was not returning > the clear slot when nreader is zero (means nworkers_launched = 0). > Due to the same scan was continue even all the tuple are exhausted, > and then end up with server crash at gather_merge_getnext(). In the patch > I also added the Assert into gather_merge_getnext(), about the index > should be less then the nreaders + 1 (leader). Well, you and David Rowley seem to disagree on what the fix is here. His patches posted upthread do A, and yours do B, and from a quick look those things are not just different ways of spelling the same underlying fix, but actually directly conflicting ideas about what the fix should be. Any chance you can review his patches, and maybe he can review yours, and we could try to agree on a consensus position? :-) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 28 March 2017 at 04:57, Robert Haas <robertmhaas@gmail.com> wrote: > On Sat, Mar 25, 2017 at 12:18 PM, Rushabh Lathia > <rushabh.lathia@gmail.com> wrote: >> About the original issue reported by Tomas, I did more debugging and >> found that - problem was gather_merge_clear_slots() was not returning >> the clear slot when nreader is zero (means nworkers_launched = 0). >> Due to the same scan was continue even all the tuple are exhausted, >> and then end up with server crash at gather_merge_getnext(). In the patch >> I also added the Assert into gather_merge_getnext(), about the index >> should be less then the nreaders + 1 (leader). > > Well, you and David Rowley seem to disagree on what the fix is here. > His patches posted upthread do A, and yours do B, and from a quick > look those things are not just different ways of spelling the same > underlying fix, but actually directly conflicting ideas about what the > fix should be. Any chance you can review his patches, and maybe he > can review yours, and we could try to agree on a consensus position? > :-) When comparing Rushabh's to my minimal patch, both change gather_merge_clear_slots() to clear the leader's slot. My fix mistakenly changes things so it does ExecInitExtraTupleSlot() on the leader's slot, but seems that's not required since gather_merge_readnext() sets the leader's slot to the output of ExecProcNode(outerPlan). I'd ignore my minimal fix because of that mistake. Rushabh's patch sidesteps this by adding a conditional pfree() to not free slot that we didn't allocate in the first place. I do think the code could be improved a bit. I don't like the way GatherMergeState's nreaders and nworkers_launched are always the same. I think this all threw me off a bit and may have been the reason for the bug in the first place. -- David Rowley http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Mar 27, 2017 at 12:11 PM, David Rowley <david.rowley@2ndquadrant.com> wrote: > On 28 March 2017 at 04:57, Robert Haas <robertmhaas@gmail.com> wrote: >> On Sat, Mar 25, 2017 at 12:18 PM, Rushabh Lathia >> <rushabh.lathia@gmail.com> wrote: >>> About the original issue reported by Tomas, I did more debugging and >>> found that - problem was gather_merge_clear_slots() was not returning >>> the clear slot when nreader is zero (means nworkers_launched = 0). >>> Due to the same scan was continue even all the tuple are exhausted, >>> and then end up with server crash at gather_merge_getnext(). In the patch >>> I also added the Assert into gather_merge_getnext(), about the index >>> should be less then the nreaders + 1 (leader). >> >> Well, you and David Rowley seem to disagree on what the fix is here. >> His patches posted upthread do A, and yours do B, and from a quick >> look those things are not just different ways of spelling the same >> underlying fix, but actually directly conflicting ideas about what the >> fix should be. Any chance you can review his patches, and maybe he >> can review yours, and we could try to agree on a consensus position? >> :-) > > When comparing Rushabh's to my minimal patch, both change > gather_merge_clear_slots() to clear the leader's slot. My fix > mistakenly changes things so it does ExecInitExtraTupleSlot() on the > leader's slot, but seems that's not required since > gather_merge_readnext() sets the leader's slot to the output of > ExecProcNode(outerPlan). I'd ignore my minimal fix because of that > mistake. Rushabh's patch sidesteps this by adding a conditional > pfree() to not free slot that we didn't allocate in the first place. > > I do think the code could be improved a bit. I don't like the way > GatherMergeState's nreaders and nworkers_launched are always the same. > I think this all threw me off a bit and may have been the reason for > the bug in the first place. Yeah, if those fields are always the same, then I think that they should be merged. That seems undeniably a good idea. Possibly we should fix the crash bug first, though, and then do that afterwards. What bugs me a little about Rushabh's fix is that it looks like magic. You have to know that we're looping over two things and freeing them up, but there's one more of one thing than the other thing. I think that at least needs some comments or something. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Mar 27, 2017 at 9:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Since this has now come up twice, I suggest adding a comment there >> that explains why we're intentionally ignoring max_parallel_workers. > Good idea. How about the attached? WFM ... but seems like there should be some flavor of this statement in the user-facing docs too (ie, "max_parallel_workers_per_gather > max_parallel_workers is a bad idea unless you're trying to test what happens when a plan can't get all the workers it planned for"). The existing text makes some vague allusions suggesting that the two GUCs might be interrelated, but I think it could be improved. regards, tom lane
On Mon, Mar 27, 2017 at 12:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Mon, Mar 27, 2017 at 9:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Since this has now come up twice, I suggest adding a comment there >>> that explains why we're intentionally ignoring max_parallel_workers. > >> Good idea. How about the attached? > > WFM ... but seems like there should be some flavor of this statement > in the user-facing docs too (ie, "max_parallel_workers_per_gather > > max_parallel_workers is a bad idea unless you're trying to test what > happens when a plan can't get all the workers it planned for"). The > existing text makes some vague allusions suggesting that the two > GUCs might be interrelated, but I think it could be improved. Do you have a more specific idea? I mean, this seems like a degenerate case of what the documentation for max_parallel_workers_per_gather says already. Even if max_parallel_workers_per_gather <= Min(max_worker_processes, max_parallel_workers), it's quite possible that you'll regularly be generating plans that can't obtain the budgeted number of workers. The only thing that is really special about the case where max_parallel_workers_per_gather > Min(max_worker_processes, max_parallel_workers) is that this can happen even on an otherwise-idle system. I'm not quite sure how to emphasize that without seeming to state the obvious. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Mar 27, 2017 at 9:52 PM, Robert Haas <robertmhaas@gmail.com> wrote:
Thanks,Yeah, if those fields are always the same, then I think that theyOn Mon, Mar 27, 2017 at 12:11 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> On 28 March 2017 at 04:57, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Sat, Mar 25, 2017 at 12:18 PM, Rushabh Lathia
>> <rushabh.lathia@gmail.com> wrote:
>>> About the original issue reported by Tomas, I did more debugging and
>>> found that - problem was gather_merge_clear_slots() was not returning
>>> the clear slot when nreader is zero (means nworkers_launched = 0).
>>> Due to the same scan was continue even all the tuple are exhausted,
>>> and then end up with server crash at gather_merge_getnext(). In the patch
>>> I also added the Assert into gather_merge_getnext(), about the index
>>> should be less then the nreaders + 1 (leader).
>>
>> Well, you and David Rowley seem to disagree on what the fix is here.
>> His patches posted upthread do A, and yours do B, and from a quick
>> look those things are not just different ways of spelling the same
>> underlying fix, but actually directly conflicting ideas about what the
>> fix should be. Any chance you can review his patches, and maybe he
>> can review yours, and we could try to agree on a consensus position?
>> :-)
>
> When comparing Rushabh's to my minimal patch, both change
> gather_merge_clear_slots() to clear the leader's slot. My fix
> mistakenly changes things so it does ExecInitExtraTupleSlot() on the
> leader's slot, but seems that's not required since
> gather_merge_readnext() sets the leader's slot to the output of
> ExecProcNode(outerPlan). I'd ignore my minimal fix because of that
> mistake. Rushabh's patch sidesteps this by adding a conditional
> pfree() to not free slot that we didn't allocate in the first place.
>
> I do think the code could be improved a bit. I don't like the way
> GatherMergeState's nreaders and nworkers_launched are always the same.
> I think this all threw me off a bit and may have been the reason for
> the bug in the first place.
should be merged. That seems undeniably a good idea.
Hmm I agree that it's good idea, and I will work on that as separate patch.
Possibly we
should fix the crash bug first, though, and then do that afterwards.
What bugs me a little about Rushabh's fix is that it looks like magic.
You have to know that we're looping over two things and freeing them
up, but there's one more of one thing than the other thing. I think
that at least needs some comments or something.
So in my second version of patch I change gather_merge_clear_slots() to
just clear the slot for the worker and some other clean up. Also throwing
NULL from gather_merge_getnext() when all the queues and heap are
exhausted - which earlier gather_merge_clear_slots() was returning clear
slot. This way we make sure that we don't run over freeing the slot for
the leader and gather_merge_getnext() don't need to depend on that
clear slot.
Rushabh Lathia
On Mon, Mar 27, 2017 at 12:36 PM, Rushabh Lathia <rushabh.lathia@gmail.com> wrote: > Hmm I agree that it's good idea, and I will work on that as separate patch. Maybe you want to start with what David already posted? >> Possibly we >> should fix the crash bug first, though, and then do that afterwards. >> What bugs me a little about Rushabh's fix is that it looks like magic. >> You have to know that we're looping over two things and freeing them >> up, but there's one more of one thing than the other thing. I think >> that at least needs some comments or something. >> > So in my second version of patch I change gather_merge_clear_slots() to > just clear the slot for the worker and some other clean up. Also throwing > NULL from gather_merge_getnext() when all the queues and heap are > exhausted - which earlier gather_merge_clear_slots() was returning clear > slot. This way we make sure that we don't run over freeing the slot for > the leader and gather_merge_getnext() don't need to depend on that > clear slot. Ah, I missed that. That does seem cleaner. Anybody see a problem with that approach? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 03/27/2017 05:51 PM, Robert Haas wrote: > On Mon, Mar 27, 2017 at 9:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >>> On Mon, Mar 27, 2017 at 1:29 AM, Rushabh Lathia >>> <rushabh.lathia@gmail.com> wrote: >>>> But it seems a bit futile to produce the parallel plan in the first place, >>>> because with max_parallel_workers=0 we can't possibly get any parallel >>>> workers ever. I wonder why compute_parallel_worker() only looks at >>>> max_parallel_workers_per_gather, i.e. why shouldn't it do: >>>> parallel_workers = Min(parallel_workers, max_parallel_workers); >>>> Perhaps this was discussed and is actually intentional, though. >> >>> It was intentional. See the last paragraph of >>> https://www.postgresql.org/message-id/CA%2BTgmoaMSn6a1780VutfsarCu0LCr%3DCO2yi4vLUo-JQbn4YuRA@mail.gmail.com >> >> Since this has now come up twice, I suggest adding a comment there >> that explains why we're intentionally ignoring max_parallel_workers. > > Hey, imagine if the comments explained the logic behind the code! > > Good idea. How about the attached? > Certainly an improvement. But perhaps we should also mention this at compute_parallel_worker, i.e. that not looking at max_parallel_workers is intentional. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 03/27/2017 01:40 PM, Rushabh Lathia wrote: > > ... > I was doing more testing with the patch and I found one more server > crash with the patch around same area, when we forced the gather > merge for the scan having zero rows. > > create table dept ( deptno numeric, dname varchar(20); > set parallel_tuple_cost =0; > set parallel_setup_cost =0; > set min_parallel_table_scan_size =0; > set min_parallel_index_scan_size =0; > set force_parallel_mode=regress; > explain analyze select * from dept order by deptno; > > This is because for leader we don't initialize the slot into gm_slots. So > in case where launched worker is zero and table having zero rows, we > end up having NULL slot into gm_slots array. > > Currently gather_merge_clear_slots() clear out the tuple table slots for > each > gather merge input and returns clear slot. In the patch I modified function > gather_merge_clear_slots() to just clear out the tuple table slots and > always return NULL when All the queues and heap us exhausted. > Isn't that just another sign the code might be a bit too confusing? I see two main issues in the code: 1) allocating 'slots' as 'nreaders+1' elements, which seems like a good way to cause off-by-one errors 2) mixing objects with different life spans (slots for readers vs. slot for the leader) seems like a bad idea too I wonder how much we gain by reusing the slot from the leader (I'd be surprised if it was at all measurable). David posted a patch reworking this, and significantly simplifying the GatherMerge node. Why not to accept that? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Mar 28, 2017 at 10:00 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
--
On 03/27/2017 01:40 PM, Rushabh Lathia wrote:
...
I was doing more testing with the patch and I found one more server
crash with the patch around same area, when we forced the gather
merge for the scan having zero rows.
create table dept ( deptno numeric, dname varchar(20);
set parallel_tuple_cost =0;
set parallel_setup_cost =0;
set min_parallel_table_scan_size =0;
set min_parallel_index_scan_size =0;
set force_parallel_mode=regress;
explain analyze select * from dept order by deptno;
This is because for leader we don't initialize the slot into gm_slots. So
in case where launched worker is zero and table having zero rows, we
end up having NULL slot into gm_slots array.
Currently gather_merge_clear_slots() clear out the tuple table slots for each
gather merge input and returns clear slot. In the patch I modified function
gather_merge_clear_slots() to just clear out the tuple table slots and
always return NULL when All the queues and heap us exhausted.
Isn't that just another sign the code might be a bit too confusing? I see two main issues in the code:
1) allocating 'slots' as 'nreaders+1' elements, which seems like a good way to cause off-by-one errors
2) mixing objects with different life spans (slots for readers vs. slot for the leader) seems like a bad idea too
I wonder how much we gain by reusing the slot from the leader (I'd be surprised if it was at all measurable). David posted a patch reworking this, and significantly simplifying the GatherMerge node. Why not to accept that?
I think we all agree that we should get rid of nreaders from the GatherMergeState
and need to do some code re-factor. But if I understood correctly that Robert's
concern was to do that re-factor as separate commit.
I picked David's patch and started reviewing the changes. I applied that patch
on top of my v2 patch (which does the re-factor of gather_merge_clear_slots).
In David's patch, into gather_merge_init(), a loop where tuple array is getting
allocate, that loop need to only up to nworkers_launched. Because we don't
hold the tuple array for leader. I changed that and did some other simple
changes based on mine v2 patch. I also performed manual testing with
the changes.
the changes.
Please find attached re-factor patch, which is v2 patch submitted for the
server crash fix. (Attaching both the patch here again, for easy of access).
Thanks,
Rushabh Lathia
Attachment
Hi, On 03/28/2017 11:07 AM, Rushabh Lathia wrote: > ... > > I think we all agree that we should get rid of nreaders from the > GatherMergeState and need to do some code re-factor. But if I > understood correctly that Robert's concern was to do that re-factor > as separate commit. > Maybe. It depends on how valuable it's to keep Gather and GatherMerge similar - having nreaders in one and not the other seems a bit weird. But maybe the refactoring would remove it from both nodes? Also, it does not really solve the issue that we're using 'nreaders' or 'nworkers_launched' to access array with one extra element. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Mar 29, 2017 at 8:38 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
Hi,
On 03/28/2017 11:07 AM, Rushabh Lathia wrote:...
I think we all agree that we should get rid of nreaders from the GatherMergeState and need to do some code re-factor. But if I
understood correctly that Robert's concern was to do that re-factor
as separate commit.
Maybe. It depends on how valuable it's to keep Gather and GatherMerge similar - having nreaders in one and not the other seems a bit weird. But maybe the refactoring would remove it from both nodes?
Also, it does not really solve the issue that we're using 'nreaders' or 'nworkers_launched' to access array with one extra element.
With the latest patches, into gather_merge_init() there is one loop where we
initialize the tuple array (which is to hold the tuple came from the each worker)
and slot (which is to hold the current tuple slot).
We hold the tuple array for the worker because when we read tuples from workers,
it's a good idea to read several at once for efficiency when possible: this minimizes
context-switching overhead.
We hold the tuple array for the worker because when we read tuples from workers,
it's a good idea to read several at once for efficiency when possible: this minimizes
context-switching overhead.
For leader we don't initialize the slot because for leader we directly call then ExecProcNode(),
which returns the slot. Also there is no point in holding the tuple array for the leader.
which returns the slot. Also there is no point in holding the tuple array for the leader.
With the latest patch's, gather_merge_clear_slots() will only clean the tuple table
slot and the tuple array buffer for the workers (nworkers_launched).
Only time we loop through the "nworkers_launched + 1" is while reading the tuple
from the each gather merge input. Personally I don't see any problem with that,
but I am very much open for suggestions.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Rushabh Lathia
On Tue, Mar 28, 2017 at 11:08 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > Maybe. It depends on how valuable it's to keep Gather and GatherMerge > similar - having nreaders in one and not the other seems a bit weird. But > maybe the refactoring would remove it from both nodes? Yeah, it appears to be the case that both Gather and GatherMerge inevitably end up with nworkers_launched == nreaders, which seems dumb. If we're going to clean this up, I think we should make them both match. > Also, it does not really solve the issue that we're using 'nreaders' or > 'nworkers_launched' to access array with one extra element. I'm not clear that there's any fundamental solution to that problem other than trying to clarify it through comments. Meantime, I think it's not good to leave this crashing, so I pushed Rushabh's v2 patch for the actual crash for now. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company