Re: BUG #15367: Crash in pg_fe_scram_free when using foreign tables - Mailing list pgsql-bugs
From | Jeremy Evans |
---|---|
Subject | Re: BUG #15367: Crash in pg_fe_scram_free when using foreign tables |
Date | |
Msg-id | 20180907165103.GH17425@jeremyevans.local Whole thread Raw |
In response to | Re: BUG #15367: Crash in pg_fe_scram_free when using foreign tables (Jeremy Evans <code@jeremyevans.net>) |
Responses |
Re: BUG #15367: Crash in pg_fe_scram_free when using foreign tables
|
List | pgsql-bugs |
On 09/06 03:28, Jeremy Evans wrote: > On 09/06 02:58, Michael Paquier wrote: > > On Thu, Sep 06, 2018 at 08:35:39PM +0000, PG Bug reporting form wrote: > > > If necessary I can build a debug version of PostgreSQL and try using that in > > > production so I can get a better backtrace if it crashes again. However, > > > considering that the crash is rare in my environment, it's unlikely I will > > > be able to produce a better backtrace for the error quickly. > > > > That would be nice. From what I can see this would be a race condition, > > which is not obvious by looking at the code. Testing with a two-node > > deployment where the first node has a foreign table which connects to a > > second node, using SCRAM authentication, holding the physical table, > > then doing many foreign scans across many clients don't show any > > problem. Did libpq complain at some point in the session where the > > crash happened about any error? > > The PostgreSQL logfile only shows: > > postgres(64978) in free(): bogus pointer (double free?) 0x4a115aec398 > 2018-09-06 12:01:52.202 PDT [45953] LOG: server process (PID 64978) was terminated by signal 6: Abort trap > 2018-09-06 12:01:52.202 PDT [45953] DETAIL: Failed process was running: ... > 2018-09-06 12:01:52.202 PDT [45953] LOG: terminating any other active server processes > > If there is another place I should look, please let me know. The log > files of the client process don't show anything during the crash, > probably because the client libpq connection was just dropped when the > server process crashed. After the crash, other client libpq connections > show the following, which is probably expected: > > WARNING: terminating connection because of crash of another server process > DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because anotherserver process exited abnormally and possibly corrupted shared memory. > HINT: In a moment you should be able to reconnect to the database and repeat your command. > > I'll try to install a version with debug symbols on September 14, and > if it crashes again I'll respond with a more complete and accurate > backtrace. We experienced an almost identical crash this morning. The query was different, but the backtrace was almost the same, and the query was using a foreign table with SCRAM authentication, just like the one yesterday. One thing that was similar between the two crashes is that shortly before both crashes, we were testing database changes on different databases in the same cluster, different from both the postgres process that crashed (the client of the foreign table scan) and the postgres process executing the foreign table scan. The database changes mostly consisted of the following statement types: DROP TABLE DROP SCHEMA DROP FUNCTION CREATE FUNCTION CREATE SCHEMA CREATE TABLE CREATE INDEX CREATE TRIGGER INSERT GRANT We only recently started testing these database changes in this cluster yesterday. Based on the timing, I'm guessing this issue only occurs when system table changes are being made. I hadn't yet had time to install debug symbols on the production server, but since I think I have a better idea on how to recreate this issue, I will try recreating this on a test cluster with debug symbols. Thanks, Jeremy
pgsql-bugs by date: