missing ML message (Re: v12.0: segfault in reindex CONCURRENTLY) - Mailing list pgsql-www
From | Justin Pryzby |
---|---|
Subject | missing ML message (Re: v12.0: segfault in reindex CONCURRENTLY) |
Date | |
Msg-id | 20191013152052.GE4475@telsasoft.com Whole thread Raw |
Responses |
Re: missing ML message (Re: v12.0: segfault in reindex CONCURRENTLY)
|
List | pgsql-www |
I was going to add a link to this thread in a commit message for our internal project's repo, but found that (even 2.5 hours later) the HTTP archives don't show my last message; I'd expect it to be here: https://www.postgresql.org/message-id/flat/20191012004446.GT10470%40telsasoft.com Quoted message follows, sent with headers like so: From pryzby@telsasoft.com Sun Oct 13 08:03:21 2019 Date: Sun, 13 Oct 2019 08:03:21 -0500 From: Justin Pryzby <pryzby@telsasoft.com> To: Michael Paquier <michael@paquier.xyz> Cc: pgsql-hackers@lists.postgresql.org, Peter Eisentraut <peter.eisentraut@2ndquadrant.com>, Andreas Karlsson <andreas@proxel.se>, Alvaro Herrera <alvherre@alvh.no-ip.org> Subject: Re: v12.0: segfault in reindex CONCURRENTLY Message-ID: <20191013130321.GD4475@telsasoft.com> References: <20191012004446.GT10470@telsasoft.com> <20191013090643.GA1434@paquier.xyz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20191013090643.GA1434@paquier.xyz> On Sun, Oct 13, 2019 at 08:03:21AM -0500, Justin Pryzby wrote: > On Sun, Oct 13, 2019 at 06:06:43PM +0900, Michael Paquier wrote: > > On Fri, Oct 11, 2019 at 07:44:46PM -0500, Justin Pryzby wrote: > > > Unfortunately, there was no core file, and I'm still trying to reproduce it. > > > > Forgot to set ulimit -c? Having a backtrace would surely help. > > Fortunately (?) another server hit crashed last night. > (Doesn't appear to be relevant, but this table has no inheritence/partition-ness). > > Looks like it's a race condition and dereferencing *holder=NULL. The first > crash was probably the same bug, due to report query running during "reindex > CONCURRENTLY", and probably finished at nearly the same time as another locker. > > Relevant code introduced here: > > commit ab0dfc961b6a821f23d9c40c723d11380ce195a6 > Author: Alvaro Herrera <alvherre@alvh.no-ip.org> > Date: Tue Apr 2 15:18:08 2019 -0300 > > Report progress of CREATE INDEX operations > > Needs to be conditionalized (as anticipated by the comment) > > + if (holder) > pgstat_progress_update_param(PROGRESS_WAITFOR_CURRENT_PID, > holder->pid); > > > Core was generated by `postgres: postgres ts [local] REINDEX '. > Program terminated with signal 11, Segmentation fault. > > #0 WaitForLockersMultiple (locktags=locktags@entry=0x1d30548, lockmode=lockmode@entry=5, progress=progress@entry=true)at lmgr.c:911 > #1 0x00000000005c2ac8 in ReindexRelationConcurrently (relationOid=relationOid@entry=17618, options=options@entry=0) atindexcmds.c:3090 > #2 0x00000000005c328a in ReindexIndex (indexRelation=<optimized out>, options=0, concurrent=<optimized out>) at indexcmds.c:2352 > #3 0x00000000007657fe in standard_ProcessUtility (pstmt=pstmt@entry=0x1d05468, queryString=queryString@entry=0x1d046e0"REINDEX INDEX CONCURRENTLY loaded_cdr_files_filename", > context=context@entry=PROCESS_UTILITY_TOPLEVEL, params=params@entry=0x0, queryEnv=queryEnv@entry=0x0, dest=dest@entry=0x1d05548, > completionTag=completionTag@entry=0x7ffc05e6c0a0 "") at utility.c:787 > #4 0x00007f21517204ef in pgss_ProcessUtility (pstmt=0x1d05468, queryString=0x1d046e0 "REINDEX INDEX CONCURRENTLY loaded_cdr_files_filename",context=PROCESS_UTILITY_TOPLEVEL, > params=0x0, queryEnv=0x0, dest=0x1d05548, completionTag=0x7ffc05e6c0a0 "") at pg_stat_statements.c:1006 > #5 0x0000000000762816 in PortalRunUtility (portal=0x1d7a4e0, pstmt=0x1d05468, isTopLevel=<optimized out>, setHoldSnapshot=<optimizedout>, dest=0x1d05548, > completionTag=0x7ffc05e6c0a0 "") at pquery.c:1175 > #6 0x0000000000763267 in PortalRunMulti (portal=portal@entry=0x1d7a4e0, isTopLevel=isTopLevel@entry=true, setHoldSnapshot=setHoldSnapshot@entry=false,dest=dest@entry=0x1d05548, > altdest=altdest@entry=0x1d05548, completionTag=completionTag@entry=0x7ffc05e6c0a0 "") at pquery.c:1328 > #7 0x0000000000763e45 in PortalRun (portal=<optimized out>, count=9223372036854775807, isTopLevel=<optimized out>, run_once=<optimizedout>, dest=0x1d05548, altdest=0x1d05548, > completionTag=0x7ffc05e6c0a0 "") at pquery.c:796 > #8 0x000000000075ff45 in exec_simple_query (query_string=<optimized out>) at postgres.c:1215 > #9 0x0000000000761212 in PostgresMain (argc=<optimized out>, argv=<optimized out>, dbname=<optimized out>, username=<optimizedout>) at postgres.c:4236 > #10 0x0000000000483d02 in BackendRun (port=<optimized out>, port=<optimized out>) at postmaster.c:4431 > #11 BackendStartup (port=0x1d2b340) at postmaster.c:4122 > #12 ServerLoop () at postmaster.c:1704 > #13 0x00000000006f0b1f in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x1cff280) at postmaster.c:1377 > #14 0x0000000000484c93 in main (argc=3, argv=0x1cff280) at main.c:228 > > bt f > > #0 WaitForLockersMultiple (locktags=locktags@entry=0x1d30548, lockmode=lockmode@entry=5, progress=progress@entry=true)at lmgr.c:911 > holder = 0x0 > lockholders = 0x1d9b778 > holders = <optimized out> > lc = 0x1d9bf80 > total = <optimized out> > done = 1 > #1 0x00000000005c2ac8 in ReindexRelationConcurrently (relationOid=relationOid@entry=17618, options=options@entry=0) atindexcmds.c:3090 > heapRelationIds = 0x1d30360 > indexIds = 0x1d303b0 > newIndexIds = <optimized out> > relationLocks = <optimized out> > lockTags = <optimized out> > lc = 0x0 > lc2 = 0x0 > private_context = <optimized out> > oldcontext = <optimized out> > relkind = 105 'i' > relationName = 0x0 > relationNamespace = 0x0 > ru0 = {tv = {tv_sec = 30592544, tv_usec = 7232025}, ru = {ru_utime = {tv_sec = 281483566645394, tv_usec = 75668733820930},ru_stime = {tv_sec = 0, tv_usec = 30592272}, { > ru_maxrss = 0, __ru_maxrss_word = 0}, {ru_ixrss = 0, __ru_ixrss_word = 0}, {ru_idrss = 105, __ru_idrss_word= 105}, {ru_isrss = -926385342574214656, > __ru_isrss_word = -926385342574214656}, {ru_minflt = 8924839, __ru_minflt_word = 8924839}, {ru_majflt = 0,__ru_majflt_word = 0}, {ru_nswap = 17618, > __ru_nswap_word = 17618}, {ru_inblock = 139781327898864, __ru_inblock_word = 139781327898864}, {ru_oublock= 30430312, __ru_oublock_word = 30430312}, { > ru_msgsnd = 139781327898864, __ru_msgsnd_word = 139781327898864}, {ru_msgrcv = 139781327898864, __ru_msgrcv_word= 139781327898864}, {ru_nsignals = 139781327898864, > __ru_nsignals_word = 139781327898864}, {ru_nvcsw = 139781327898864, __ru_nvcsw_word = 139781327898864}, {ru_nivcsw= 139781327898864, > __ru_nivcsw_word = 139781327898864}}} > __func__ = "ReindexRelationConcurrently" > #2 0x00000000005c328a in ReindexIndex (indexRelation=<optimized out>, options=0, concurrent=<optimized out>) at indexcmds.c:2352 > state = {concurrent = true, locked_table_oid = 17608} > indOid = 17618 > irel = <optimized out> > persistence = 112 'p' > ... -- Justin Pryzby System Administrator Telsasoft +1-952-707-8581