PG10.1 autovac killed building extended stats - Mailing list pgsql-hackers
From | Justin Pryzby |
---|---|
Subject | PG10.1 autovac killed building extended stats |
Date | |
Msg-id | 20171117152714.GA25754@telsasoft.com Whole thread Raw |
Responses |
Re: PG10.1 autovac killed building extended stats
Re: PG10.1 autovac killed building extended stats |
List | pgsql-hackers |
After adding extended/MV stats to a few of our tables a few days ago, it looks like I wasn't been paying attention and this first crashed 2 nights ago. Why at 1am? not sure. I have an "reindex" job which runs at 1am, and an vacuum/analyze job which runs at 2am, but I don't use cron to change autovac/analyze thresholds.. Nov 16 01:15:42 database kernel: postmaster[16144]: segfault at 0 ip 00000000006f3e3e sp 00007ffe2c8fc650 error 4 in postgres[400000+693000] [...] Nov 17 01:15:41 database kernel: postmaster[7145]: segfault at 0 ip 00000000006f3e3e sp 00007ffe2c8fc650 error 4 in postgres[400000+693000] [pryzbyj@database ~]$ ps -fu postgres UID PID PPID C STIME TTY TIME CMD postgres 1757 1 0 Nov09 ? 00:20:43 /usr/pgsql-10/bin/postmaster -D /var/lib/pgsql/10/data => no longer running Alvaro's original patch on this server, which was also first to experience the BRIN crash.. < 2017-11-16 01:15:43.103 -04 >LOG: server process (PID 16144) was terminated by signal 11: Segmentation fault < 2017-11-16 01:15:43.103 -04 >DETAIL: Failed process was running: autovacuum: ANALYZE public.daily_enodeb_baseband_view_201711 < 2017-11-17 01:15:41.673 -04 >LOG: server process (PID 7145) was terminated by signal 11: Segmentation fault < 2017-11-17 01:15:41.673 -04 >DETAIL: Failed process was running: autovacuum: ANALYZE public.daily_enodeb_baseband_view_201711 Here's a log of my reindex job for the last two nights' crashes. It's suspicious that the baseband table was reindexed ~15min before autovac crashed during its processing, but it's also unclear to me why it would matter. [...] |Thu Nov 16 01:02:54 -04 2017: daily_enodeb_baseband_view_201711: daily_enodeb_baseband_view_201711_unique_idx(repack current)... |Thu Nov 16 01:02:59 -04 2017: daily_enodeb_baseband_view_201711: pg_toast.pg_toast_691157026_index(reindex system)... ... |Thu Nov 16 01:15:22 -04 2017: eric_hss_l2if_metrics_cum: eric_hss_l2if_metrics_cum_start_time_idx(repack non-partitioned)... |Thu Nov 16 01:15:27 -04 2017: eric_hss_platform_metrics: eric_hss_platform_metrics_start_time_idx(repack non-partitioned)... |WARNING: terminating connection because of crash of another server process |DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because anotherserver process exited abnormally and possibly corrupted shared memory. |HINT: In a moment you should be able to reconnect to the database and repeat your command. |WARNING: Error creating index "public"."index_40552734": server closed the connection unexpectedly | This probably means the server terminated abnormally | before or while processing the request. |WARNING: Skipping index swapping for "eric_hss_platform_metrics", since no new indexes built |WARNING: repack failed for "eric_hss_platform_metrics_start_time_idx" |psql: FATAL: the database system is in recovery mode |Filesystem Size Used Avail Use% Mounted on |/dev/vdb 2.6T 2.1T 459G 83% /var/lib/pgsql [...] |Fri Nov 17 01:01:44 -04 2017: daily_enodeb_baseband_view_201711: daily_enodeb_baseband_view_201711_unique_idx(repack current)... |Fri Nov 17 01:01:54 -04 2017: daily_enodeb_baseband_view_201711: pg_toast.pg_toast_691157026_index(reindex system)... ... |Fri Nov 17 01:14:58 -04 2017: link_busy_hr: lbh_start_time_idx(repack non-partitioned)... |Fri Nov 17 01:14:58 -04 2017: loaded_cdr_files: loaded_cdr_files_filename(repack non-partitioned)... |WARNING: terminating connection because of crash of another server process |DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because anotherserver process exited abnormally and possibly corrupted shared memory. |HINT: In a moment you should be able to reconnect to the database and repeat your command. |WARNING: Error creating index "public"."index_30971": server closed the connection unexpectedly | This probably means the server terminated abnormally | before or while processing the request. |WARNING: Skipping index swapping for "loaded_cdr_files", since no new indexes built |WARNING: repack failed for "loaded_cdr_files_filename" |psql: FATAL: the database system is in recovery mode Core was generated by `postgres: autovacuum worker process gtt '. Program terminated with signal 11, Segmentation fault. #0 statext_ndistinct_build (totalrows=300, numrows=300, rows=0x21df3e8, attrs=<value optimized out>, stats=0x0) at mvdistinct.c:103 103 item->attrs = bms_add_member(item->attrs, (gdb) p stats $5 = (VacAttrStats **) 0x0 => This is an error, no ?? (gdb) p item $1 = (MVNDistinctItem *) 0x2090928 (gdb) p result $6 = (MVNDistinct *) 0x2090918 (gdb) p item->attrs $2 = (Bitmapset *) 0x0 (gdb) p rows $3 = (HeapTuple *) 0x21df3e8 (gdb) p j $8 = <value optimized out> (gdb) p attrs $4 = <value optimized out> Let me know if there's anything more I should send. Justin (gdb) bt f #0 statext_ndistinct_build (totalrows=300, numrows=300, rows=0x21df3e8, attrs=<value optimized out>, stats=0x0) at mvdistinct.c:103 item = 0x2090928 j = <value optimized out> combination = 0x20908c8 generator = 0x2090d08 result = 0x2090918 k = 2 itemcnt = 0 numattrs = 4 numcombs = <value optimized out> #1 0x00000000006f1a0e in BuildRelationExtStatistics (onerel=0x7f7895485fa0, totalrows=300, numrows=300, rows=0x21df3e8,natts=34147512, vacattrstats=0x0) at extended_stats.c:120 t = <value optimized out> stat = 0x2090b48 stats = <value optimized out> lc2 = 0x2090c48 ndistinct = <value optimized out> dependencies= <value optimized out> pg_stext = 0x7f7895495478 lc = 0x2090cb8 stats = 0x0 cxt = 0x2054cd8 oldcxt = 0x1fd0f58 __func__ = "BuildRelationExtStatistics" #2 0x000000000057aad6 in do_analyze_rel (onerel=0x7f7895485fa0, options=98, params=0x2094e90, va_cols=0x0, acquirefunc=0x57bb20<acquire_sample_rows>, relpages=80, inh=0 '\000', in_outer_xact=0 '\000', elevel=13) at analyze.c:593 col_context = 0x2054cd8 old_context = 0x1fd0f58 attr_cnt = 24 tcnt = <value optimizedout> i = <value optimized out> ind = <value optimized out> Irel = 0x2074838 nindexes = 1 hasindex = <value optimized out> vacattrstats = 0x2095868 indexdata = 0x2097810 targrows = <valueoptimized out> numrows = 300 totalrows = 300 totaldeadrows = 20 rows = 0x21df3e8 ru0= {tv = {tv_sec = 1510809341, tv_usec = 958864}, ru = {ru_utime = {tv_sec = 0, tv_usec = 436933}, ru_stime = {tv_sec =0, tv_usec = 18997}, ru_maxrss = 14988, ru_ixrss = 0, ru_idrss = 0, ru_isrss = 0, ru_minflt = 5090, ru_majflt= 0, ru_nswap = 0, ru_inblock = 2560, ru_oublock = 240, ru_msgsnd = 0, ru_msgrcv = 0, ru_nsignals = 0, ru_nvcsw= 163, ru_nivcsw = 26}} starttime = 0 caller_context = 0x2055338 save_userid = <value optimizedout> save_sec_context = 0 save_nestlevel = 2 __func__ = "do_analyze_rel" #3 0x000000000057b7a4 in analyze_rel (relid=<value optimized out>, relation=0x7ffe2c8fcfc0, options=98, params=0x2094e90,va_cols=0x0, in_outer_xact=0 '\000', bstrategy=0x21c1358) at analyze.c:276 onerel = 0x7f7895485fa0 elevel = 13 acquirefunc = 0x57bb20 <acquire_sample_rows> relpages = 80 __func__ = "analyze_rel" #4 0x00000000005e6edf in vacuum (options=98, relation=0x7ffe2c8fcfc0, relid=<value optimized out>, params=0x2094e90, va_cols=0x0,bstrategy=<value optimized out>, isTopLevel=1 '\001') at vacuum.c:323 relid = 691157026 cur = 0x208a8e8 save_exception_stack = 0x7ffe2c8fce30 save_context_stack = 0x0 local_sigjmp_buf = {{__jmpbuf ={33904488, 2571701076356583921, 9927477, 4294967295, 200, 20, -2572437610069143055, 2571700781410898417}, __mask_was_saved= 0, __saved_mask = {__val = {0, 0, 0, 0, 34861944, 0, 140155877187248, 140160420441688, 2,140729646043152, 257015007210, 140160420441688, 8282462, 1510809341, 958742, 140160420553984}}}} stmttype = <valueoptimized out> in_outer_xact = 0 '\000' use_own_xacts = 1 '\001' relations = 0x208a908 in_vacuum= 1 '\001' __func__ = "vacuum" #5 0x00000000006a7d22 in autovacuum_do_vac_analyze () at autovacuum.c:3096 rangevar = {type = T_Invalid, catalogname= 0x0, schemaname = 0x213f378 "public", relname = 0x21c18b8 "daily_enodeb_baseband_view_201711", inh = 0 '\000',relpersistence = 0 '\000', alias = 0x0, location = -1} #6 do_autovacuum () at autovacuum.c:2451 save_exception_stack = 0x7ffe2c8fd110 save_context_stack = 0x0 local_sigjmp_buf = {{__jmpbuf = {691157026, 2571696918420345329, 34163832, 4294967295, 200, 20, -2572437610046074383, 2571700883870929393},__mask_was_saved = 0, __saved_mask = {__val = {0, 0, 140729646043392, 15874306463571202545,2571702211223048689, 0, 6998306, 140160420441688, 747622240, 13239856, 13239680, 0, 6444974, 13241760,0, 0}}}} tab = 0x2094e88 skipit = 0 '\000' stdVacuumCostDelay = 0 stdVacuumCostLimit = 200 iter = <value optimized out> relid = 691157026 classRel = 0x7f7895453e58 tuple = <value optimizedout> relScan = <value optimized out> dbForm = <value optimized out> table_oids = <value optimizedout> orphan_oids = 0x0 ctl = {num_partitions = 0, ssize = 0, dsize = 0, max_dsize = 0, ffactor = 0,keysize = 4, entrysize = 80, hash = 0, match = 0, keycopy = 0, alloc = 0, hcxt = 0x0, hctl = 0x0} table_toast_map= 0x20a5338 cell = 0x2095020 shared = 0x206a208 dbentry = 0x206a490 bstrategy = 0x21c1358 key = {sk_flags = 0, sk_attno = 16, sk_strategy = 3, sk_subtype = 0, sk_collation = 100, sk_func = {fn_addr= 0x750cd0 <chareq>, fn_oid = 61, fn_nargs = 2, fn_strict = 1 '\001', fn_retset = 0 '\000', fn_stats =2 '\002', fn_extra = 0x0, fn_mcxt = 0x1fd0378, fn_expr = 0x0}, sk_argument = 116} pg_class_desc = 0x2093858 effective_multixact_freeze_max_age = 400000000 did_vacuum = <value optimized out> found_concurrent_worker = 0'\000' i = <value optimized out> __func__ = "do_autovacuum" #7 0x00000000006a84d8 in AutoVacWorkerMain (argc=<value optimized out>, argv=<value optimized out>) at autovacuum.c:1694 dbname = "gtt\000\000\000\000\000\325I\001\327;\000\000\000\020?\000\000\000\000\000\000E\352\200\327;\000\000\000\001\000\000\000\000\000\000\000\002\000\000\000\000\000\000\000\374$\316", '\000'<repeats 12 times> local_sigjmp_buf = {{__jmpbuf = {0, 2571701076356059633, 33315888, 1510809326, 16143, 1510808581,-2572437610981404175, 2571700891614532081}, __mask_was_saved = 1, __saved_mask = {__val = { 18446744066192964103,13188120, 1, 13186760, 0, 16143, 1510808581, 0, 257010229664, 1, 0, 140729646043600, 257014425560, 4,0, 4}}}} dbid = 16400 __func__ = "AutoVacWorkerMain" #8 0x00000000006a85a6 in StartAutoVacWorker () at autovacuum.c:1499 worker_pid = 0 __func__ = "StartAutoVacWorker" #9 0x00000000006b606a in StartAutovacuumWorker (postgres_signal_arg=<value optimized out>) at postmaster.c:5414 bn= 0x1fc5c30 #10 sigusr1_handler (postgres_signal_arg=<value optimized out>) at postmaster.c:5111 save_errno = 4 __func__= "sigusr1_handler"
pgsql-hackers by date: