Thread: postgres 9.5 DB corruption
(gdb) where #0 pglz_decompress (source=source@entry=0xa617904 "0", slen=8139, dest=dest@entry=0x4268e028 "", rawsize=808452096) at pg_lzcompress.c:745 #1 0x080f3079 in toast_decompress_datum (attr=0xa6178fc) at tuptoaster.c:2210 #2 0x080f3716 in heap_tuple_untoast_attr (attr=0xa6178fc) at tuptoaster.c:183 #3 0x08440955 in pg_detoast_datum_packed (datum=<optimized out>) at fmgr.c:2270 #4 0x084145bf in text_to_cstring (t=0x7592fd2a) at varlena.c:176 #5 0x0843e874 in FunctionCall1Coll (flinfo=flinfo@entry=0xa614738, collation=collation@entry=0, arg1=arg1@entry=1972567338) at fmgr.c:1297 #6 0x0843fef8 in OutputFunctionCall (flinfo=0xa614738, val=1972567338) at fmgr.c:1950 #7 0x080bf84b in printtup (slot=0xa613bf4, self=0xa60d714) at printtup.c:359 #8 0x08220f9a in ExecutePlan (dest=0xa60d714, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, operation=CMD_SELECT, planstate=0xa613974, estate=0xa6138ec) at execMain.c:1574 #9 standard_ExecutorRun (queryDesc=0xa6134e4, direction=ForwardScanDirection, count=0) at execMain.c:337 #10 0x08332c1b in PortalRunSelect (portal=portal@entry=0xa6114dc, forward=forward@entry=1 '\001', count=0, count@entry=2147483647, dest=dest@entry=0xa60d714) at pquery.c:942 #11 0x08333fa7 in PortalRun (portal=portal@entry=0xa6114dc, count=count@entry=2147483647, isTopLevel=isTopLevel@entry=1 '\001', dest=dest@entry=0xa60d714, altdest=altdest@entry=0xa60d714, completionTag=completionTag@entry=0xffd5d71c "") at pquery.c:786 #12 0x08330ba8 in exec_simple_query (query_string=0xa5f1754 "select * from ams.alert_attribute_bak;") at postgres.c:1096 #13 PostgresMain (argc=1, argv=0xa53dbbc, dbname=0xa53daec "ams", username=0xa53dadc "akamai") at postgres.c:4049 #14 0x080b53af in BackendRun (port=0xa584b78) at postmaster.c:4312 #15 BackendStartup (port=0xa584b78) at postmaster.c:3986 #16 ServerLoop () at postmaster.c:1705 #17 0x082d0dd7 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0xa53d2a8) at postmaster.c:1313 #18 0x080b68eb in main (argc=3, argv=0xa53d2a8) at main.c:228 (gdb)
On 7/24/19 7:38 AM, Thomas Tignor wrote: > Hello postgres community, > > Writing again to see if there are insights on this issue. We have had > infrequent but recurring corruption since upgrading from postgres 9.1 to > postgres 9.5. We are presently on 9.5.16. Our DB-facing app continually > performs a mixture of DML, primarily inserts and updates on two specific > tables, with no single op being suspect. In the past, corruption events > have produced encoding errors on COPY operations (invalid byte sequence > for encoding "UTF8"). More recently, they have caused segmentation > faults. We were able to take a cold backup after a recent event. > SELECTing the corrupted data on our cold backup yields the following > stack. Any info on a solution or how to proceed towards a solution would > be much appreciated. More information would be useful: 1) Schema of the tables. 2) Source of the data. > > Thanks in advance. > > > (gdb) where > #0 pglz_decompress (source=source@entry=0xa617904 "0", slen=8139, dest=dest@entry=0x4268e028 "", rawsize=808452096) atpg_lzcompress.c:745 > #1 0x080f3079 in toast_decompress_datum (attr=0xa6178fc) at tuptoaster.c:2210 > #2 0x080f3716 in heap_tuple_untoast_attr (attr=0xa6178fc) at tuptoaster.c:183 > #3 0x08440955 in pg_detoast_datum_packed (datum=<optimized out>) at fmgr.c:2270 > #4 0x084145bf in text_to_cstring (t=0x7592fd2a) at varlena.c:176 > #5 0x0843e874 in FunctionCall1Coll (flinfo=flinfo@entry=0xa614738, collation=collation@entry=0, arg1=arg1@entry=1972567338)at fmgr.c:1297 > #6 0x0843fef8 in OutputFunctionCall (flinfo=0xa614738, val=1972567338) at fmgr.c:1950 > #7 0x080bf84b in printtup (slot=0xa613bf4, self=0xa60d714) at printtup.c:359 > #8 0x08220f9a in ExecutePlan (dest=0xa60d714, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, operation=CMD_SELECT,planstate=0xa613974, estate=0xa6138ec) at execMain.c:1574 > #9 standard_ExecutorRun (queryDesc=0xa6134e4, direction=ForwardScanDirection, count=0) at execMain.c:337 > #10 0x08332c1b in PortalRunSelect (portal=portal@entry=0xa6114dc, forward=forward@entry=1 '\001', count=0, count@entry=2147483647,dest=dest@entry=0xa60d714) at pquery.c:942 > #11 0x08333fa7 in PortalRun (portal=portal@entry=0xa6114dc, count=count@entry=2147483647, isTopLevel=isTopLevel@entry=1'\001', dest=dest@entry=0xa60d714, altdest=altdest@entry=0xa60d714, completionTag=completionTag@entry=0xffd5d71c"") > at pquery.c:786 > #12 0x08330ba8 in exec_simple_query (query_string=0xa5f1754 "select * from ams.alert_attribute_bak;") at postgres.c:1096 > #13 PostgresMain (argc=1, argv=0xa53dbbc, dbname=0xa53daec "ams", username=0xa53dadc "akamai") at postgres.c:4049 > #14 0x080b53af in BackendRun (port=0xa584b78) at postmaster.c:4312 > #15 BackendStartup (port=0xa584b78) at postmaster.c:3986 > #16 ServerLoop () at postmaster.c:1705 > #17 0x082d0dd7 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0xa53d2a8) at postmaster.c:1313 > #18 0x080b68eb in main (argc=3, argv=0xa53d2a8) at main.c:228 > (gdb) > > > > Tom :-) -- Adrian Klaver adrian.klaver@aklaver.com
On 7/24/19 7:38 AM, Thomas Tignor wrote: > Hello postgres community, > > Writing again to see if there are insights on this issue. We have had > infrequent but recurring corruption since upgrading from postgres 9.1 to > postgres 9.5. We are presently on 9.5.16. Our DB-facing app continually > performs a mixture of DML, primarily inserts and updates on two specific > tables, with no single op being suspect. In the past, corruption events > have produced encoding errors on COPY operations (invalid byte sequence > for encoding "UTF8"). More recently, they have caused segmentation > faults. We were able to take a cold backup after a recent event. > SELECTing the corrupted data on our cold backup yields the following > stack. Any info on a solution or how to proceed towards a solution would > be much appreciated. > > Thanks in advance. > In my previous post when I referred to table schema I mean that to include associated schema like triggers, constraints, etc. Basically what is returned by \d in psql. > Tom :-) -- Adrian Klaver adrian.klaver@aklaver.com
ams=# \d ams.alert_instance
Table "ams.alert_instance"
Column | Type | Modifiers
---------------------+--------------------------------+-----------
alert_instance_id | integer | not null
alert_definition_id | integer | not null
alert_instance_key | character varying(500) | not null
start_active_date | timestamp(0) without time zone | not null
stop_active_date | timestamp(0) without time zone |
active | smallint | not null
acknowledged | smallint | not null
ack_clear_time | timestamp(0) without time zone |
user_set_clear_time | smallint |
category_id | integer | not null
condition_start | timestamp(0) without time zone | not null
unack_reason | character varying(1) |
viewer_visible | smallint | not null
Indexes:
"pk_alert_instance" PRIMARY KEY, btree (alert_instance_id), tablespace "tbls5"
"idx_alert_inst_1" btree (alert_instance_key, alert_definition_id, alert_instance_id, active, acknowledged, ack_clear_time), tablespace "tbls5"
"idx_alert_inst_cat_id" btree (category_id), tablespace "tbls5"
"idx_alert_inst_def_id" btree (alert_definition_id), tablespace "tbls5"
Check constraints:
"ck_alert_inst_acked" CHECK (acknowledged = 0 OR acknowledged = 1)
"ck_alert_inst_active" CHECK (active = 0 OR active = 1)
"ck_alert_inst_set_cl_tm" CHECK (user_set_clear_time = 0 OR user_set_clear_time = 1)
"ck_alert_inst_viewer_vis" CHECK (viewer_visible = 0 OR viewer_visible = 1)
Foreign-key constraints:
"fk_alert_inst_cat_id" FOREIGN KEY (category_id) REFERENCES ams.category(category_id)
"fk_alert_inst_def_id" FOREIGN KEY (alert_definition_id) REFERENCES ams.alert_definition(alert_definition_id)
"fk_alert_inst_unack_reason" FOREIGN KEY (unack_reason) REFERENCES ams.unack_reason(unack_reason)
Referenced by:
TABLE "ams.alert_attribute" CONSTRAINT "fk_alert_attr_instance_id" FOREIGN KEY (alert_instance_id) REFERENCES ams.alert_instance(alert_instance_id) ON DELETE CASCADE
Triggers:
_ams_cluster_logtrigger AFTER INSERT OR DELETE OR UPDATE ON ams.alert_instance FOR EACH ROW EXECUTE PROCEDURE _ams_cluster.logtrigger('_ams_cluster', '1', 'k')
_ams_cluster_truncatetrigger BEFORE TRUNCATE ON ams.alert_instance FOR EACH STATEMENT EXECUTE PROCEDURE _ams_cluster.log_truncate('1')
Disabled user triggers:
_ams_cluster_denyaccess BEFORE INSERT OR DELETE OR UPDATE ON ams.alert_instance FOR EACH ROW EXECUTE PROCEDURE _ams_cluster.denyaccess('_ams_cluster')
_ams_cluster_truncatedeny BEFORE TRUNCATE ON ams.alert_instance FOR EACH STATEMENT EXECUTE PROCEDURE _ams_cluster.deny_truncate()
ams=#
ams=# \d ams.alert_attribute
Table "ams.alert_attribute"
Column | Type | Modifiers
-------------------+-------------------------+-----------
alert_instance_id | integer | not null
name | character varying(200) | not null
data_type | smallint | not null
value | character varying(2000) |
Indexes:
"pk_alert_attributes" PRIMARY KEY, btree (alert_instance_id, name), tablespace "tbls5"
"idx_alert_attr_name" btree (name)
Foreign-key constraints:
"fk_alert_attr_instance_id" FOREIGN KEY (alert_instance_id) REFERENCES ams.alert_instance(alert_instance_id) ON DELETE CASCADE
Triggers:
_ams_cluster_logtrigger AFTER INSERT OR DELETE OR UPDATE ON ams.alert_attribute FOR EACH ROW EXECUTE PROCEDURE _ams_cluster.logtrigger('_ams_cluster', '2', 'kk')
_ams_cluster_truncatetrigger BEFORE TRUNCATE ON ams.alert_attribute FOR EACH STATEMENT EXECUTE PROCEDURE _ams_cluster.log_truncate('2')
Disabled user triggers:
_ams_cluster_denyaccess BEFORE INSERT OR DELETE OR UPDATE ON ams.alert_attribute FOR EACH ROW EXECUTE PROCEDURE _ams_cluster.denyaccess('_ams_cluster')
_ams_cluster_truncatedeny BEFORE TRUNCATE ON ams.alert_attribute FOR EACH STATEMENT EXECUTE PROCEDURE _ams_cluster.deny_truncate()
ams=#
> Hello postgres community,
>
> Writing again to see if there are insights on this issue. We have had
> infrequent but recurring corruption since upgrading from postgres 9.1 to
> postgres 9.5. We are presently on 9.5.16. Our DB-facing app continually
> performs a mixture of DML, primarily inserts and updates on two specific
> tables, with no single op being suspect. In the past, corruption events
> have produced encoding errors on COPY operations (invalid byte sequence
> for encoding "UTF8"). More recently, they have caused segmentation
> faults. We were able to take a cold backup after a recent event.
> SELECTing the corrupted data on our cold backup yields the following
> stack. Any info on a solution or how to proceed towards a solution would
> be much appreciated.
>
> Thanks in advance.
>
In my previous post when I referred to table schema I mean that to
include associated schema like triggers, constraints, etc. Basically
what is returned by \d in psql.
> Tom :-)
--
Adrian Klaver
adrian.klaver@aklaver.com
On 7/25/19 10:24 AM, Thomas Tignor wrote: > Hi Adrian, > Thanks for responding. Below is the schema data for the tables where we > always see corruption. You'll notice they have triggers for a postgres > extension called Slony-I which provides replication service. It's not > clear if/how that's a factor, though. What specific version of Slony? Did you upgrade Slony when you moved from 9.1 to 9.5? Trace you showed in your first post was for: ams.alert_attribute_bak I do not see that below. Are the errors on any specific field? The errors are occurring on the primary, correct? Where is the data coming from? > > ams=# \d ams.alert_instance > > Table "ams.alert_instance" > > Column|Type| Modifiers > > ---------------------+--------------------------------+----------- > > alert_instance_id| integer| not null > > alert_definition_id | integer| not null > > alert_instance_key| character varying(500)| not null > > start_active_date| timestamp(0) without time zone | not null > > stop_active_date| timestamp(0) without time zone | > > active| smallint| not null > > acknowledged| smallint| not null > > ack_clear_time| timestamp(0) without time zone | > > user_set_clear_time | smallint| > > category_id| integer| not null > > condition_start| timestamp(0) without time zone | not null > > unack_reason| character varying(1)| > > viewer_visible| smallint| not null > > Indexes: > > "pk_alert_instance" PRIMARY KEY, btree (alert_instance_id), tablespace > "tbls5" > > "idx_alert_inst_1" btree (alert_instance_key, alert_definition_id, > alert_instance_id, active, acknowledged, ack_clear_time), tablespace "tbls5" > > "idx_alert_inst_cat_id" btree (category_id), tablespace "tbls5" > > "idx_alert_inst_def_id" btree (alert_definition_id), tablespace "tbls5" > > Check constraints: > > "ck_alert_inst_acked" CHECK (acknowledged = 0 OR acknowledged = 1) > > "ck_alert_inst_active" CHECK (active = 0 OR active = 1) > > "ck_alert_inst_set_cl_tm" CHECK (user_set_clear_time = 0 OR > user_set_clear_time = 1) > > "ck_alert_inst_viewer_vis" CHECK (viewer_visible = 0 OR viewer_visible = 1) > > Foreign-key constraints: > > "fk_alert_inst_cat_id" FOREIGN KEY (category_id) REFERENCES > ams.category(category_id) > > "fk_alert_inst_def_id" FOREIGN KEY (alert_definition_id) REFERENCES > ams.alert_definition(alert_definition_id) > > "fk_alert_inst_unack_reason" FOREIGN KEY (unack_reason) REFERENCES > ams.unack_reason(unack_reason) > > Referenced by: > > TABLE "ams.alert_attribute" CONSTRAINT "fk_alert_attr_instance_id" > FOREIGN KEY (alert_instance_id) REFERENCES > ams.alert_instance(alert_instance_id) ON DELETE CASCADE > > Triggers: > > _ams_cluster_logtrigger AFTER INSERT OR DELETE OR UPDATE ON > ams.alert_instance FOR EACH ROW EXECUTE PROCEDURE > _ams_cluster.logtrigger('_ams_cluster', '1', 'k') > > _ams_cluster_truncatetrigger BEFORE TRUNCATE ON ams.alert_instance FOR > EACH STATEMENT EXECUTE PROCEDURE _ams_cluster.log_truncate('1') > > Disabled user triggers: > > _ams_cluster_denyaccess BEFORE INSERT OR DELETE OR UPDATE ON > ams.alert_instance FOR EACH ROW EXECUTE PROCEDURE > _ams_cluster.denyaccess('_ams_cluster') > > _ams_cluster_truncatedeny BEFORE TRUNCATE ON ams.alert_instance FOR EACH > STATEMENT EXECUTE PROCEDURE _ams_cluster.deny_truncate() > > ams=# > > ams=# \d ams.alert_attribute > > Table "ams.alert_attribute" > > Column|Type| Modifiers > > -------------------+-------------------------+----------- > > alert_instance_id | integer| not null > > name| character varying(200)| not null > > data_type| smallint| not null > > value| character varying(2000) | > > Indexes: > > "pk_alert_attributes" PRIMARY KEY, btree (alert_instance_id, name), > tablespace "tbls5" > > "idx_alert_attr_name" btree (name) > > Foreign-key constraints: > > "fk_alert_attr_instance_id" FOREIGN KEY (alert_instance_id) REFERENCES > ams.alert_instance(alert_instance_id) ON DELETE CASCADE > > Triggers: > > _ams_cluster_logtrigger AFTER INSERT OR DELETE OR UPDATE ON > ams.alert_attribute FOR EACH ROW EXECUTE PROCEDURE > _ams_cluster.logtrigger('_ams_cluster', '2', 'kk') > > _ams_cluster_truncatetrigger BEFORE TRUNCATE ON ams.alert_attribute FOR > EACH STATEMENT EXECUTE PROCEDURE _ams_cluster.log_truncate('2') > > Disabled user triggers: > > _ams_cluster_denyaccess BEFORE INSERT OR DELETE OR UPDATE ON > ams.alert_attribute FOR EACH ROW EXECUTE PROCEDURE > _ams_cluster.denyaccess('_ams_cluster') > > _ams_cluster_truncatedeny BEFORE TRUNCATE ON ams.alert_attribute FOR > EACH STATEMENT EXECUTE PROCEDURE _ams_cluster.deny_truncate() > > ams=# > > > > Tom :-) > > > On Wednesday, July 24, 2019, 11:15:04 AM EDT, Adrian Klaver > <adrian.klaver@aklaver.com> wrote: > > > On 7/24/19 7:38 AM, Thomas Tignor wrote: > > Hello postgres community, > > > > Writing again to see if there are insights on this issue. We have had > > infrequent but recurring corruption since upgrading from postgres 9.1 to > > postgres 9.5. We are presently on 9.5.16. Our DB-facing app continually > > performs a mixture of DML, primarily inserts and updates on two specific > > tables, with no single op being suspect. In the past, corruption events > > have produced encoding errors on COPY operations (invalid byte sequence > > for encoding "UTF8"). More recently, they have caused segmentation > > faults. We were able to take a cold backup after a recent event. > > SELECTing the corrupted data on our cold backup yields the following > > stack. Any info on a solution or how to proceed towards a solution would > > be much appreciated. > > > > Thanks in advance. > > > > In my previous post when I referred to table schema I mean that to > include associated schema like triggers, constraints, etc. Basically > what is returned by \d in psql. > > > > Tom :-) > > > > -- > Adrian Klaver > adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com> > > > -- Adrian Klaver adrian.klaver@aklaver.com