Re: should check interrupts in BuildRelationExtStatistics ? - Mailing list pgsql-hackers
From | Justin Pryzby |
---|---|
Subject | Re: should check interrupts in BuildRelationExtStatistics ? |
Date | |
Msg-id | 20220605014233.GR29853@telsasoft.com Whole thread Raw |
In response to | Re: should check interrupts in BuildRelationExtStatistics ? (Justin Pryzby <pryzby@telsasoft.com>) |
Responses |
Re: should check interrupts in BuildRelationExtStatistics ?
|
List | pgsql-hackers |
On Fri, Jun 03, 2022 at 10:28:37AM -0500, Justin Pryzby wrote: > Maybe this should actually call vacuum_delay_point(), like > compute_scalar_stats(). I think vacuum_delay_point() would be wrong for these cases, since they don't call "fetchfunc()", like the other places which use vacuum_delay_point. > For MCV, there seems to be no issue, since those > functions are being called (but only for expressional stats). But maybe I've > just failed to make a large enough, non-expressional MCV list for the problem > to be apparent. I reproduced the issue with MCV like this: DROP TABLE IF EXISTS t; CREATE TABLE t AS SELECT a::text,b::text,c::text,d::text,e::text,f::text,g::text FROM generate_series(1000001,1000006)a,generate_series(1000001,1000006)b,generate_series(1000001,1000006)c,generate_series(1000001,1000006)d,generate_series(1000001,1000006)e,generate_series(1000001,1000006)f,generate_series(1000001,1000006)g,generate_series(1000001,1000006)h; VACUUMt; DROP STATISTICS IF EXISTS stxx; CREATE STATISTICS stxx (mcv) ON a,b,c,d,e,f FROM t; ALTER STATISTICS stxx SET STATISTICS9999; ANALYZE VERBOSE t; This is slow (25 seconds) inside qsort: (gdb) bt #0 __memcmp_sse4_1 () at ../sysdeps/x86_64/multiarch/memcmp-sse4.S:1020 #1 0x00005653d8686fac in varstrfastcmp_locale (a1p=0x5653dce67d54 "1000004~", len1=7, a2p=0x5653e895ffa4 "1000004~", len2=7,ssup=ssup@entry=0x5653d98c37b8) at varlena.c:2444 #2 0x00005653d8687161 in varlenafastcmp_locale (x=94918188367184, y=94918384418720, ssup=0x5653d98c37b8) at varlena.c:2270 #3 0x00005653d85134d8 in ApplySortComparator (ssup=0x5653d98c37b8, isNull2=<optimized out>, datum2=<optimized out>, isNull1=<optimizedout>, datum1=<optimized out>) at ../../../src/include/utils/sortsupport.h:224 #4 multi_sort_compare (a=0x7fa587b44e58, b=0x7fa5875f0dd0, arg=0x5653d98c37b0) at extended_stats.c:903 #5 0x00005653d8712eed in qsort_arg (data=data@entry=0x7fa5875f0050, n=<optimized out>, n@entry=1679616, element_size=element_size@entry=24,compare=compare@entry=0x5653d8513483 <multi_sort_compare>, arg=arg@entry=0x5653d98c37b0) at ../../src/include/lib/sort_template.h:349 #6 0x00005653d851415f in build_sorted_items (data=data@entry=0x7fa58f2e1050, nitems=nitems@entry=0x7ffe4f764e5c, mss=mss@entry=0x5653d98c37b0,numattrs=6, attnums=0x7fa58f2e1078) at extended_stats.c:1134 #7 0x00005653d8515d84 in statext_mcv_build (data=data@entry=0x7fa58f2e1050, totalrows=totalrows@entry=1679616, stattarget=stattarget@entry=9999)at mcv.c:204 #8 0x00005653d8513819 in BuildRelationExtStatistics (onerel=onerel@entry=0x7fa5b26ef658, inh=inh@entry=false, totalrows=1679616,numrows=numrows@entry=1679616, rows=rows@entry=0x7fa5a4103050, natts=natts@entry=7, vacattrstats=vacattrstats@entry=0x5653d98b76b0) at extended_stats.c:213 The fix seems to be to CHECK_FOR_INTERRUPTS() within multi_sort_compare(). That would supercede the other two CHECK_FOR_INTERRUPTS I'd proposed, and handle mcv, depends, and ndistinct all at once. Does that sound right ? For MCV, there's also ~0.6sec spent in build_column_frequencies(), which (if needed) would be addressed by adding CFI in sort_item_compare.
pgsql-hackers by date: