Re: Proposal: Conflict log history table for Logical Replication - Mailing list pgsql-hackers
From | shveta malik |
---|---|
Subject | Re: Proposal: Conflict log history table for Logical Replication |
Date | |
Msg-id | CAJpy0uB-ScuYXzKPuhvnZ2Dk0n5_sPHe9bDgSHMYswrkCH1iqg@mail.gmail.com Whole thread Raw |
In response to | Re: Proposal: Conflict log history table for Logical Replication (Dilip Kumar <dilipbalaut@gmail.com>) |
Responses |
Re: Proposal: Conflict log history table for Logical Replication
|
List | pgsql-hackers |
On Thu, Aug 7, 2025 at 3:08 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Thu, Aug 7, 2025 at 1:43 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > On Thu, Aug 7, 2025 at 12:25 PM shveta malik <shveta.malik@gmail.com> wrote: > > Thanks Shveta for your opinion on the design. > > > > On Tue, Aug 5, 2025 at 5:54 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > > This proposal aims to address these limitations by introducing a > > > > conflict log history table, providing a structured, and queryable > > > > record of all logical replication conflicts. This should be a > > > > configurable option whether to log into the conflict log history > > > > table, server logs or both. > > > > > > > > > > +1 for the idea. > > Thanks > > > > > > > > This proposal has two main design questions: > > > > =================================== > > > > > > > > 1. How do we store conflicting tuples from different tables? > > > > Using a JSON column to store the row data seems like the most flexible > > > > solution, as it can accommodate different table schemas. > > > > > > Yes, that is one option. I have not looked into details myself, but > > > you can also explore 'anyarray' used in pg_statistics to store 'Column > > > data values of the appropriate kind'. > > I think conversion from row to json and json to row is convenient and > also other extensions like pgactive/bdr also provide as JSON. Okay. Agreed. > But we > can explore this alternative options as well, thanks > > > > > 2. Should this be a system table or a user table? > > > > a) System Table: Storing this in a system catalog is simple, but > > > > catalogs aren't designed for ever-growing data. While pg_large_object > > > > is an exception, this is not what we generally do IMHO. > > > > b) User Table: This offers more flexibility. We could allow a user to > > > > specify the table name during CREATE SUBSCRIPTION. Then we choose to > > > > either create the table internally or let the user create the table > > > > with a predefined schema. > > > > > > > > A potential drawback is that a user might drop or alter the table. > > > > However, we could mitigate this risk by simply logging a WARNING if > > > > the table is configured but an insertion fails. > > > > > > I believe it makes more sense for this to be a catalog table rather > > > than a user table. I wanted to check if we already have a large > > > catalog table of this kind, and I think pg_statistic could be an > > > example of a sizable catalog table. To get a rough idea of how size > > > scales with data, I ran a quick experiment: I created 1000 tables, > > > each with 2 JSON columns, 1 text column, and 2 integer columns. Then, > > > I inserted 1000 rows into each table and ran ANALYZE to collect > > > statistics. Here’s what I observed on a fresh database before and > > > after: > > > > > > Before: > > > pg_statistic row count: 412 > > > Table size: ~256 kB > > > > > > After: > > > pg_statistic row count: 6,412 > > > Table size: ~5.3 MB > > > > > > Although it isn’t an exact comparison, this gives us some insight into > > > how the statistics catalog table size grows with the number of rows. > > > It doesn’t seem excessively large with 6k rows, given the fact that > > > pg_statistic itself is a complex table having many 'anyarray'-type > > > columns. > > Yeah that's good analysis, apart from this pg_largeobject is also a > catalog which grows with each large object and growth rate for that > will be very high because it stores large object data in catalog. > > > > > > > That said, irrespective of what we decide, it would be ideal to offer > > > users an option for automatic purging, perhaps via a retention period > > > parameter like conflict_stats_retention_period (say default to 30 > > > days), or a manual purge API such as purge_conflict_stats('older than > > > date'). I wasn’t able to find any such purge mechanism for PostgreSQL > > > stats tables, but Oracle does provide such purging options for some of > > > their statistics tables (not related to conflicts), see [1], [2]. > > > And to manage it better, it could be range partitioned on timestamp. > > Yeah that's an interesting suggestion to timestamp based partitioning > it for purging. > > > It seems BDR also has one such conflict-log table which is a catalog > > table and is also partitioned on time. It has a default retention > > period of 30 days. See 'bdr.conflict_history' mentioned under > > 'catalogs' in [1] > > > > [1]: https://www.enterprisedb.com/docs/pgd/latest/reference/tables-views-functions/#user-visible-catalogs-and-views > > Actually bdr is an extension and this table is under extension > namespace (bdr.conflict_history) so this is not really a catalog but > its a extension managed table. Yes, right. Sorry for confusion. > So logically for PostgreSQL its an > user table but yeah this is created and managed by the extension. > Any idea if the user can alter/drop or perform any DML on it? I could not find any details on this part. > -- > Regards, > Dilip Kumar > Google
pgsql-hackers by date: