Re: logical decoding - reading a user catalog table - Mailing list pgsql-hackers
From | Steve Singer |
---|---|
Subject | Re: logical decoding - reading a user catalog table |
Date | |
Msg-id | BLU437-SMTP22260383F3D4568DF9860DDC9C0@phx.gbl Whole thread Raw |
In response to | Re: logical decoding - reading a user catalog table (Andres Freund <andres@2ndquadrant.com>) |
Responses |
Re: logical decoding - reading a user catalog table
|
List | pgsql-hackers |
On 10/28/2014 01:31 PM, Andres Freund wrote: > On 2014-10-25 18:18:07 -0400, Steve Singer wrote: >> My logical decoding plugin is occasionally getting this error >> >> "could not resolve cmin/cmax of catalog tuple" >> >> I get this when my output plugin is trying to read one of the user defined >> catalog tables (user_catalog_table=true) > Hm. That should obviously not happen. > > Could you describe how that table is modified? Does that bug happen > initially, or only after a while? It doesn't happen right away, in this case it was maybe 4 minutes after creating the slot. The error also doesn't always happen when I run the this test workload but it is reproducible with some trying. I' don't do anything special to that table, it gets created then I do inserts on it. I don't do an alter table or anything fancy like that. I was running the slony failover test (all nodes under the same postmaster) which involves the occasional dropping and recreating of databases along with normal query load + replication. I'll send you tar of the data directory off list with things in this state. > Do you have a testcase that would allow me to easily reproduce the > problem? I don't have a isolated test case that does this. The test that I'm hitting this with does lots of stuff and doesn't even always hit this. >> I am not sure if this is a bug in the time-travel support in the logical >> decoding support of if I'm just using it wrong (ie not getting a sufficient >> lock on the relation or something). > I don't know yet... > >> This is the interesting part of the stack trace >> >> #4 0x000000000091bbc8 in HeapTupleSatisfiesHistoricMVCC >> (htup=0x7fffcf42a900, >> snapshot=0x7f786ffe92d8, buffer=10568) at tqual.c:1631 >> #5 0x00000000004aedf3 in heapgetpage (scan=0x28d7080, page=0) at >> heapam.c:399 >> #6 0x00000000004b0182 in heapgettup_pagemode (scan=0x28d7080, >> dir=ForwardScanDirection, nkeys=0, key=0x0) at heapam.c:747 >> #7 0x00000000004b1ba6 in heap_getnext (scan=0x28d7080, >> direction=ForwardScanDirection) at heapam.c:1475 >> #8 0x00007f787002dbfb in lookupSlonyInfo (tableOid=91754, ctx=0x2826118, >> origin_id=0x7fffcf42ab8c, table_id=0x7fffcf42ab88, >> set_id=0x7fffcf42ab84) >> at slony_logical.c:663 >> #9 0x00007f787002b7a3 in pg_decode_change (ctx=0x2826118, txn=0x28cbec0, >> relation=0x7f787a3446a8, change=0x7f786ffe3268) at slony_logical.c:237 >> #10 0x00000000007497d4 in change_cb_wrapper (cache=0x28cbda8, txn=0x28cbec0, >> relation=0x7f787a3446a8, change=0x7f786ffe3268) at logical.c:704 >> >> >> >> Here is what the code in lookupSlonyInfo is doing >> ------------------ >> >> sltable_oid = get_relname_relid("sl_table",slony_namespace); >> >> sltable_rel = relation_open(sltable_oid,AccessShareLock); >> tupdesc=RelationGetDescr(sltable_rel); >> scandesc=heap_beginscan(sltable_rel, >> GetCatalogSnapshot(sltable_oid),0,NULL); >> reloid_attnum = get_attnum(sltable_oid,"tab_reloid"); >> >> if(reloid_attnum == InvalidAttrNumber) >> elog(ERROR,"sl_table does not have a tab_reloid column"); >> set_attnum = get_attnum(sltable_oid,"tab_set"); >> >> if(set_attnum == InvalidAttrNumber) >> elog(ERROR,"sl_table does not have a tab_set column"); >> tableid_attnum = get_attnum(sltable_oid, "tab_id"); >> >> if(tableid_attnum == InvalidAttrNumber) >> elog(ERROR,"sl_table does not have a tab_id column"); >> >> while( (tuple = heap_getnext(scandesc,ForwardScanDirection) )) > (Except missing spaces ;)) I don't see anything obviously wrong with > this. > > Greetings, > > Andres Freund >
pgsql-hackers by date: