Re: Segmentation fault with core dump - Mailing list pgsql-general
From | Hiroshi Inoue |
---|---|
Subject | Re: Segmentation fault with core dump |
Date | |
Msg-id | 51B73A00.1030206@tpf.co.jp Whole thread Raw |
In response to | Re: Segmentation fault with core dump (Joshua Berry <yoberi@gmail.com>) |
Responses |
Re: Segmentation fault with core dump
|
List | pgsql-general |
Hi, (2013/05/09 1:39), Joshua Berry wrote: > | I'm using PG 9.1.9 with a client application using various versions of > the > | pgsqlODBC driver on Windows. Cursors are used heavily, as well as some > pretty > | heavy trigger queries on db writes which update several materialized > views. > | > | The server has 48GB RAM installed, PG is configured for 12GB shared > buffers, > | 8MB max_stack_depth, 32MB temp_buffers, and 2MB work_mem. Most of the > other > | settings are defaults. > | > | The server will seg fault from every few days to up to two weeks. Each > time > | one of the postgres server processes seg faults, the server gets > terminated by > | signal 11, restarts in recovery for up to 30 seconds, after which time it > | accepts connections as if nothing ever happened. Unfortunately all the > open > | cursors and connections are lost, so the client apps are left in a bad > state. > | > | Seg faults have also occurred with PG 8.4. ... I migrated the database > to a > | server running PG9.1 with the hopes that the problem would disappear, > but it > | has not. So now I'm starting to debug. > | > | # uname -a > | Linux [hostname] 2.6.32-358.2.1.el6.x86_64 #1 SMP Tue Mar 12 14:18:09 > CDT 2013 > | x86_64 x86_64 x86_64 GNU/Linux > | # cat /etc/redhat-release > | Scientific Linux release 6.3 (Carbon) > | > | # psql -U jberry > | psql (9.1.9) > | Type "help" for help. > | > | jberry=# select version(); > | version > | > ------------------------------------------------------------------------------- > | PostgreSQL 9.1.9 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) > 4.4.7 > | 20120313 (Red Hat 4.4.7-3), 64-bit > | (1 row) > > I've had another postmaster segfault on my production server. It appears > to be the same failure as the last one nearly a month ago, but I wanted > to post the gdb bt details in case it helps shed light on the issue. > Please let me know if anyone would like to drill into the dumped core > with greater detail. Both the OS and PG versions remain unchanged. > > Kind Regards, > -Joshua > > > On Fri, Apr 12, 2013 at 6:12 AM, Andres Freund <andres@2ndquadrant.com > <mailto:andres@2ndquadrant.com>> wrote: > > On 2013-04-10 19:06:12 -0400, Tom Lane wrote: > > I wrote: > > > (Wanders away wondering just how much the regression tests exercise > > > holdable cursors.) > > > > And the answer is they're not testing this code path at all, > because if > > you do > > DECLARE c CURSOR WITH HOLD FOR ... > > FETCH ALL FROM c; > > then the second query executes with a portal (and resource owner) > > created to execute the FETCH command, not directly on the held > portal. > > > > After a little bit of thought I'm not sure it's even possible to > > reproduce this problem with libpq, because it doesn't expose any > way to > > issue a bare protocol Execute command against a pre-existing portal. > > (I had thought psqlOBC went through libpq, but maybe it's playing > some > > games here.) > > > > Anyway, I'm thinking the appropriate fix might be like this > > > > - CurrentResourceOwner = portal->resowner; > > + if (portal->resowner) > > + CurrentResourceOwner = portal->resowner; > > > > in several places in pquery.c; that is, keep using > > TopTransactionResourceOwner if the portal doesn't have its own. > > > > A more general but probably much more invasive solution would be > to fake > > up an intermediate portal when pulling data from a held portal, to > > more closely approximate the explicit-FETCH case. > > We could also allocate a new resowner for the duration of that > transaction. That would get reassigned to the transactions resowner in > PreCommit_Portals (after a slight change there). > That actually seems simple enough? I made some changes to multi thread handling of psqlodbc driver. It's also better to fix the crash at backend side. I made 2 patches. The 1st one temporarily changes CurrentResourceOwner to CurTransactionResourceOwner during catalog cache handling. The 2nd one allocates a new resource owner for held portals. Both fix the crash in my test case. regards, Hiroshi Inoue
Attachment
pgsql-general by date: