Re: BUG #13985: Segmentation fault on PREPARE TRANSACTION - Mailing list pgsql-bugs
From | Shulgin, Oleksandr |
---|---|
Subject | Re: BUG #13985: Segmentation fault on PREPARE TRANSACTION |
Date | |
Msg-id | CACACo5RdOOZLS2fRELPO77Ob0jK+2k8mhFVJ2UBUS157FTp7rA@mail.gmail.com Whole thread Raw |
In response to | Re: BUG #13985: Segmentation fault on PREPARE TRANSACTION (Andres Freund <andres@anarazel.de>) |
Responses |
Re: BUG #13985: Segmentation fault on PREPARE TRANSACTION
Re: BUG #13985: Segmentation fault on PREPARE TRANSACTION |
List | pgsql-bugs |
On Wed, Feb 24, 2016 at 10:52 PM, Andres Freund <andres@anarazel.de> wrote: > On 2016-02-24 17:52:37 -0300, Alvaro Herrera wrote: > > chris.tessels@inergy.nl wrote: > > > > > Core was generated by `postgres: mailinfo_ow mailinfo_ods > 10.50.6.6(4188'. > > > Program terminated with signal 11, Segmentation fault. > > > > > > #0 MinimumActiveBackends (min=50) at procarray.c:2472 > > > 2472 if (pgxact->xid == InvalidTransactionId) > > > > It's not surprising that you're not able to make this crash > > consistently, because it looks like the problem might be in concurrent > > modifications to the PGXACT array. This routine, MinimumActiveBackends, > > walks the PGPROC array explicitely without locks. There are comments > > indicating that this is safe, but evidently something has slipped in > > there. > > > > Apparently this code is trying to dereference an invalid pgxact, but > > it's not clear to me how this happens. Those structs are allocated in > > advance, and they are referenced in the code via array indexes, so even > > if the pgxact doesn't actually hold data about a valid transaction, > > dereferencing the XID shouldn't cause a crash. > > Well, that code is pretty, uh, questionable. E.g. for > int pgprocno = > arrayP->pgprocnos[index]; > volatile PGPROC *proc = &allProcs[pgprocno]; > volatile PGXACT *pgxact = &allPgXact[pgprocno]; > there's no guarantee that pgprocno is actually the same index for both > lookups and the following > if (pgprocno == -1) > continue; /* do not count > deleted entries */ > check. It's perfectly reasonable for a compiler to reload pgprocno from > memory, or just always reference it via memory. > Hm... this is against my understanding of what a compiler could (or should) do. Do you have a documentation reference or other evidence? A naive disassemble dump on-my-box(tm) suggests that pgprocno is stored on stack (with offset -0x1c) and is referenced from there: ... 0x00000000007f8011 <+53>: mov -0x18(%rbp),%rax 0x00000000007f8015 <+57>: mov -0x20(%rbp),%edx 0x00000000007f8018 <+60>: movslq %edx,%rdx 0x00000000007f801b <+63>: add $0x8,%rdx 0x00000000007f801f <+67>: mov 0x8(%rax,%rdx,4),%eax 0x00000000007f8023 <+71>: mov %eax,-0x1c(%rbp) 0x00000000007f8026 <+74>: mov 0x67b15b(%rip),%rdx # 0xe73188 <allProcs> 0x00000000007f802d <+81>: mov -0x1c(%rbp),%eax # <== here 0x00000000007f8030 <+84>: cltq 0x00000000007f8032 <+86>: imul $0x338,%rax,%rax 0x00000000007f8039 <+93>: add %rdx,%rax 0x00000000007f803c <+96>: mov %rax,-0x10(%rbp) 0x00000000007f8040 <+100>: mov 0x67b149(%rip),%rcx # 0xe73190 <allPgXact> 0x00000000007f8047 <+107>: mov -0x1c(%rbp),%eax # <== here 0x00000000007f804a <+110>: movslq %eax,%rdx 0x00000000007f804d <+113>: mov %rdx,%rax 0x00000000007f8050 <+116>: add %rax,%rax 0x00000000007f8053 <+119>: add %rdx,%rax 0x00000000007f8056 <+122>: shl $0x2,%rax 0x00000000007f805a <+126>: add %rcx,%rax 0x00000000007f805d <+129>: mov %rax,-0x8(%rbp) 0x00000000007f8061 <+133>: cmpl $0xffffffff,-0x1c(%rbp) # <== and here 0x00000000007f8065 <+137>: je 0x7f80a4 <MinimumActiveBackends+200> ... I presume what happened here is that initially arrayP->pgprocnos[index] > was -1, but by the time if (pgprocno == -1) is reached, it changed to a > different value. > > It's also really crummy that we're doing the PGPROC/PGXACT lookups > before checking whether pgprocno is -1. > No doubt here. At the very least ISTM that we have to make pgprocno volatile (or use a > memory barrier - but we don't have sufficient support for those in the > older branches), and move the PGPROC/PGXACT lookups after the == -1 > check. > Use of volatile doesn't change the resulting code dramatically for me. The above is when compiling without optimizations. With -Og I'm getting similar code for the variant with volatile and if no volatile, pgprocno is just loaded into a register as one would expect. -- Alex
pgsql-bugs by date: