Re: Proposal: Conflict log history table for Logical Replication - Mailing list pgsql-hackers
From | Dilip Kumar |
---|---|
Subject | Re: Proposal: Conflict log history table for Logical Replication |
Date | |
Msg-id | CAFiTN-v2N256FZAMcgG9f2s7mENC2r0ME9HoKMb+tER7yxUCjQ@mail.gmail.com Whole thread Raw |
In response to | Re: Proposal: Conflict log history table for Logical Replication (shveta malik <shveta.malik@gmail.com>) |
Responses |
Re: Proposal: Conflict log history table for Logical Replication
|
List | pgsql-hackers |
On Thu, Aug 7, 2025 at 1:43 PM shveta malik <shveta.malik@gmail.com> wrote: > > On Thu, Aug 7, 2025 at 12:25 PM shveta malik <shveta.malik@gmail.com> wrote: Thanks Shveta for your opinion on the design. > > On Tue, Aug 5, 2025 at 5:54 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > This proposal aims to address these limitations by introducing a > > > conflict log history table, providing a structured, and queryable > > > record of all logical replication conflicts. This should be a > > > configurable option whether to log into the conflict log history > > > table, server logs or both. > > > > > > > +1 for the idea. Thanks > > > > > This proposal has two main design questions: > > > =================================== > > > > > > 1. How do we store conflicting tuples from different tables? > > > Using a JSON column to store the row data seems like the most flexible > > > solution, as it can accommodate different table schemas. > > > > Yes, that is one option. I have not looked into details myself, but > > you can also explore 'anyarray' used in pg_statistics to store 'Column > > data values of the appropriate kind'. I think conversion from row to json and json to row is convenient and also other extensions like pgactive/bdr also provide as JSON. But we can explore this alternative options as well, thanks > > > 2. Should this be a system table or a user table? > > > a) System Table: Storing this in a system catalog is simple, but > > > catalogs aren't designed for ever-growing data. While pg_large_object > > > is an exception, this is not what we generally do IMHO. > > > b) User Table: This offers more flexibility. We could allow a user to > > > specify the table name during CREATE SUBSCRIPTION. Then we choose to > > > either create the table internally or let the user create the table > > > with a predefined schema. > > > > > > A potential drawback is that a user might drop or alter the table. > > > However, we could mitigate this risk by simply logging a WARNING if > > > the table is configured but an insertion fails. > > > > I believe it makes more sense for this to be a catalog table rather > > than a user table. I wanted to check if we already have a large > > catalog table of this kind, and I think pg_statistic could be an > > example of a sizable catalog table. To get a rough idea of how size > > scales with data, I ran a quick experiment: I created 1000 tables, > > each with 2 JSON columns, 1 text column, and 2 integer columns. Then, > > I inserted 1000 rows into each table and ran ANALYZE to collect > > statistics. Here’s what I observed on a fresh database before and > > after: > > > > Before: > > pg_statistic row count: 412 > > Table size: ~256 kB > > > > After: > > pg_statistic row count: 6,412 > > Table size: ~5.3 MB > > > > Although it isn’t an exact comparison, this gives us some insight into > > how the statistics catalog table size grows with the number of rows. > > It doesn’t seem excessively large with 6k rows, given the fact that > > pg_statistic itself is a complex table having many 'anyarray'-type > > columns. Yeah that's good analysis, apart from this pg_largeobject is also a catalog which grows with each large object and growth rate for that will be very high because it stores large object data in catalog. > > > > That said, irrespective of what we decide, it would be ideal to offer > > users an option for automatic purging, perhaps via a retention period > > parameter like conflict_stats_retention_period (say default to 30 > > days), or a manual purge API such as purge_conflict_stats('older than > > date'). I wasn’t able to find any such purge mechanism for PostgreSQL > > stats tables, but Oracle does provide such purging options for some of > > their statistics tables (not related to conflicts), see [1], [2]. > > And to manage it better, it could be range partitioned on timestamp. Yeah that's an interesting suggestion to timestamp based partitioning it for purging. > It seems BDR also has one such conflict-log table which is a catalog > table and is also partitioned on time. It has a default retention > period of 30 days. See 'bdr.conflict_history' mentioned under > 'catalogs' in [1] > > [1]: https://www.enterprisedb.com/docs/pgd/latest/reference/tables-views-functions/#user-visible-catalogs-and-views Actually bdr is an extension and this table is under extension namespace (bdr.conflict_history) so this is not really a catalog but its a extension managed table. So logically for PostgreSQL its an user table but yeah this is created and managed by the extension. -- Regards, Dilip Kumar Google
pgsql-hackers by date: