Thread: BUG #18616: Long-running hash index build can not be interrupted

BUG #18616: Long-running hash index build can not be interrupted

From

PG Bug reporting form

Date:

13 September 2024, 13:00:00

The following bug has been logged on the website:

Bug reference:      18616
Logged by:          Alexander Lakhin
Email address:      exclusion@gmail.com
PostgreSQL version: 17rc1
Operating system:   Ubuntu 22.04
Description:

The following script:
CREATE TABLE t(i int);
INSERT INTO t SELECT 1 FROM generate_series(1, 10000000);

SET maintenance_work_mem = '1GB';

SET statement_timeout = '90s';
CREATE INDEX hi ON t USING hash (i);

reaches a state, when the backend can not be interrupted.

gdb shows that the code execution loops inside:
#0  _h_indexbuild (...) at hashsort.c:151
#1  0x0000557f3671cbf3 in hashbuild (...)
    at hash.c:183
...
with tups_done increasing slowly (given the total number of tuples).

Reproduced on all supported versions.

Re: BUG #18616: Long-running hash index build can not be interrupted

From

Alexander Lakhin

Date:

13 September 2024, 21:00:00

Hi Pavel,

13.09.2024 17:51, Pavel Borisov wrote:

Reproduced on all supported versions.
I was unable to reproduce it on my machine with these settings. Tried statement timeouts 30-120s. Index build appears to be in interruptible phase on my system.
But could you check in your environment with the following patch?

Thank you for paying attention to this!

Yes, the patch works for me. The query is interrupted as expected.
But I wonder, why don't you the see same?

Could you please show the server log, with the following change applied and
log_statement = 'all'
backtrace_functions = 'ProcessInterrupts'
?

--- a/src/backend/access/hash/hashsort.c
+++ b/src/backend/access/hash/hashsort.c
@@ -125,6 +125,7 @@ _h_indexbuild(HSpool *hspool, Relation heapRel)
     uint32      hashkey = 0;
#endif

+elog(LOG, "_h_indexbuild() start");
     tuplesort_performsort(hspool->sortstate);

     while ((itup = tuplesort_getindextuple(hspool->sortstate, true)) != NULL)
@@ -151,4 +152,5 @@ _h_indexbuild(HSpool *hspool, Relation heapRel)
         pgstat_progress_update_param(PROGRESS_CREATEIDX_TUPLES_DONE,
                                      ++tups_done);
     }
+elog(LOG, "_h_indexbuild() end; tups_done: %ld", tups_done);
}

Best regards,
Alexander

Re: BUG #18616: Long-running hash index build can not be interrupted

From

Tom Lane

Date:

13 September 2024, 22:22:26

Alexander Lakhin <exclusion@gmail.com> writes:
> 13.09.2024 17:51, Pavel Borisov wrote:
>> I was unable to reproduce it on my machine with these settings. Tried statement timeouts 30-120s. Index build
appears 
>> to be in interruptible phase on my system.

> Yes, the patch works for me. The query is interrupted as expected.
> But I wonder, why don't you the see same?

I see the same result as Alexander: the query fails to time out
after the expected 90 seconds, and it's looping in _h_indexbuild.

Looking at hashbuild, the effective sort_threshold depends on
NBuffers, so maybe if you have that set to a high enough value
it fails to go into the sort path?  If I use

SET maintenance_work_mem = '128MB';

instead of the suggested 1GB, I don't see the problem.

            regards, tom lane

Re: BUG #18616: Long-running hash index build can not be interrupted

From

Tom Lane

Date:

13 September 2024, 22:45:11

I wrote:
> Looking at hashbuild, the effective sort_threshold depends on
> NBuffers, so maybe if you have that set to a high enough value
> it fails to go into the sort path?  If I use
> SET maintenance_work_mem = '128MB';
> instead of the suggested 1GB, I don't see the problem.

Oh, false alarm: that test in hashbuild takes basically the
min of maintenance_work_mem and NBuffers, so that with
default NBuffers of 128MB, there's no difference here between
those two settings.

The reason I see a difference in behavior seems to be that
with maintenance_work_mem = 1GB, the tuple sorting step
completes faster, allowing control to reach _h_indexbuild
before the 10sec timeout I was testing with.  With the
smaller maintenance_work_mem setting, we're still sorting
when it times out --- and there are CHECK_FOR_INTERRUPTS
calls in the sort code.

            regards, tom lane