Thread: Parallel bt build crashes when DSM_NONE
Hello. I happend to find that server crashes during regtest when DSM_NONE is enforced. The attached patch fixes that. The cause is the fact that _bt_spools_heapscan runs _bt_begin_parallel() even if dynamic_shared_memory_type is DSM_NONE. It is because plan_create_index_workers() is ignoring dynamic_shared_memory_type. We can reproduce this by letting initdb set dynamic_shared_memory_type=none regardless of actual availability. (Second attached) and just "make check". regards, -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c index 740de49..3e8cd14 100644 --- a/src/backend/optimizer/plan/planner.c +++ b/src/backend/optimizer/plan/planner.c @@ -5825,7 +5825,8 @@ plan_create_index_workers(Oid tableOid, Oid indexOid) double allvisfrac; /* Return immediately when parallelism disabled */ - if (max_parallel_maintenance_workers == 0) + if (dynamic_shared_memory_type == DSM_IMPL_NONE || + max_parallel_maintenance_workers == 0) return 0; /* Set up largely-dummy planner state */ diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c index 2efd3b7..876e153 100644 --- a/src/bin/initdb/initdb.c +++ b/src/bin/initdb/initdb.c @@ -871,6 +871,7 @@ choose_dsm_implementation(void) #ifdef HAVE_SHM_OPEN int ntries = 10; + return "none"; while (ntries > 0) { uint32 handle;
On Fri, Feb 09, 2018 at 05:06:35PM +0900, Kyotaro HORIGUCHI wrote: > I happend to find that server crashes during regtest when > DSM_NONE is enforced. The attached patch fixes that. > > The cause is the fact that _bt_spools_heapscan runs > _bt_begin_parallel() even if dynamic_shared_memory_type is > DSM_NONE. It is because plan_create_index_workers() is ignoring > dynamic_shared_memory_type. Adding Peter Geoghegan as the author and Robert as the committer in CC, as that's a mistake from 9da0cc35. > We can reproduce this by letting initdb set > dynamic_shared_memory_type=none regardless of actual > availability. (Second attached) and just "make check". Or more simply you can just setup an instance with this configuration and run installcheck. No need to patch initdb for that. 4 regression tests fail when using dynamic_shared_memory_type=none: join, aggregates, select_parallel and write_parallel. test_shm_mq of course blows up. Could that justify getting rid of DSM_IMPL_NONE? As far as I can see there is an alternative on Windows, and we fallback to sysv in the worst case. So I am wondering what's actually the use case for "none". And it is good to keep alternate outputs at a minimum, those tend to rot easily. Except for those mind errands your patch looks good to me. -- Michael
Attachment
On Fri, Feb 9, 2018 at 12:06 AM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > I happend to find that server crashes during regtest when > DSM_NONE is enforced. The attached patch fixes that. > > The cause is the fact that _bt_spools_heapscan runs > _bt_begin_parallel() even if dynamic_shared_memory_type is > DSM_NONE. It is because plan_create_index_workers() is ignoring > dynamic_shared_memory_type. I think that your patch does the right thing. plan_create_index_workers() is supposed to take care of parallel safety, and this is a parallel safety issue. Thanks -- Peter Geoghegan
On Mon, Feb 12, 2018 at 12:26 PM, Peter Geoghegan <pg@bowt.ie> wrote: > I think that your patch does the right thing. > plan_create_index_workers() is supposed to take care of parallel > safety, and this is a parallel safety issue. Committed the patch. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
At Mon, 12 Feb 2018 15:00:54 -0500, Robert Haas <robertmhaas@gmail.com> wrote in <CA+TgmoZot0Pm9JqLs=_jspm78YCjeq=2u+0Oa4kg+gGVvO7Urg@mail.gmail.com> > On Mon, Feb 12, 2018 at 12:26 PM, Peter Geoghegan <pg@bowt.ie> wrote: > > I think that your patch does the right thing. > > plan_create_index_workers() is supposed to take care of parallel > > safety, and this is a parallel safety issue. > > Committed the patch. Thank you Robert for Committing, and Peter and Michael for the comments and supplement. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
At Mon, 12 Feb 2018 22:05:36 +0900, Michael Paquier <michael@paquier.xyz> wrote in <20180212130536.GA18625@paquier.xyz> > On Fri, Feb 09, 2018 at 05:06:35PM +0900, Kyotaro HORIGUCHI wrote: > 4 regression tests fail when using dynamic_shared_memory_type=none: > join, aggregates, select_parallel and write_parallel. test_shm_mq of > course blows up. Could that justify getting rid of DSM_IMPL_NONE? As A query runs into the fallback route in the case, which even though the regtest doesn't care about. So it alone doesn't justify that. > far as I can see there is an alternative on Windows, and we fallback to > sysv in the worst case. So I am wondering what's actually the use case > for "none". And it is good to keep alternate outputs at a minimum, > those tend to rot easily. Agreed. As Tom mentioned, no bf animal haven't complained with that since 9.4 and I belive they won't. initdb doesn't create a database with DSM_IMPL_NONE. Although it can be manually deactivated, the fact that we haven't have a complain about that can justify to remove it to some extent. I'll post a patch that eliminate DSM_IMPL_NONE after this. I'd like it to be shipped on 11. > Except for those mind errands your patch looks good to me. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
On Wed, Feb 14, 2018 at 10:38:58AM +0900, Kyotaro HORIGUCHI wrote: > I'll post a patch that eliminate DSM_IMPL_NONE after this. I'd like it > to be shipped on 11. Feel free to. Be just careful that the connection attempts in test_config_settings() should try to use the DSM implementation defined by choose_dsm_implementation(). -- Michael