Home > mailing lists

Re: table versioning approach (not auditing) - Mailing list pgsql-general

From	Jim Nasby
Subject	Re: table versioning approach (not auditing)
Date	October 6, 2014 21:47:52
Msg-id	54330DEE.6090207@BlueTreble.com Whole thread Raw
In response to	Re: table versioning approach (not auditing) (Adam Brusselback <adambrusselback@gmail.com>)
Responses	Re: table versioning approach (not auditing)
List	pgsql-general

Tree view

On 10/2/14, 9:27 AM, Adam Brusselback wrote:
> i've also tried to implement a database versioning using JSON to log changes in tables. Here it is:
https://github.com/fxku/audit[https://github.com/fxku/audit][https://github.com/fxku/audit[https://github.com/fxku/audit]]
<https://github.com/fxku/audit%5Bhttps://github.com/fxku/audit%5D%5Bhttps://github.com/fxku/audit%5Bhttps://github.com/fxku/audit%5D%5D>
> I've got two versioning tables, one storing information about all transactions that happened and one where i put the
JSONlogs of row changes of each table. I'm only logging old values and not complete rows. 
>
> Then I got a function that recreates a database state at a given time into a separate schema - either to VIEWs,
MVIEWsor TABLES. This database state could then be indexed in order to work with it. You can also reset the production
stateto the recreated past state. 
>
> Unfortunately I've got no time to further work on it at the moment + I have not done tests with many changes in the
databaseso I can't say if the recreation process scales well. On downside I've realised is that using the json_agg
functionhas limits when I've got binary data. It gets too long. So I'm really looking forward using JSONB. 
>
> There are more plans in my mind. By having a Transaction_Log table it should be possible to revert only certain
transactions.I'm also thinking of parallel versioning, e.g. different users are all working with their version of the
databaseand commit their changes to the production state. As I've got a unique history ID for each table and each row,
Ishould be able to map the affected records. 
Sorry I'm coming late to this thread. I agree that getting interested people together would be a good idea. Is there
anothermailing list we can do that with? 

Versioning is also something I've interested in, and have put a lot of thought into (if not much actual code :( ). I'll
alsomake some general comments, if I may... 

I think timestamps should be *heavily avoided* in versioning, because they are frequently the wrong way to solve a
problem.There are many use cases where you're trying to answer "What values were in place when X happened", and the
simplest,most fool-proof way to answer that is that when you create a record for X, part of that record is a "history
ID"that shows you the exact data used. For example, if you're creating an invoicing system that has versioning of
customeraddresses you would not try and join an invoice with it's address using a timestamp; you would put an actual
address_history_idin the invoice table. 

I thought I saw a reference to versioning sets of information. This is perhaps the trickiest part. You first have to
thinkabout the non-versioned sets (ie: a customer may have many phone numbers) before you think about versioning the
set.In this example, you want the history of the *set* of phone numbers, not of each individual number. Design it with
fullduplication of data first, don't think about normalizing until you have the full set versioning design. 

I understand the generic appeal of using something like JSON, but in reality I don't see it working terribly well. It's
likelyto be on the slow side, and it'll also be difficult to query from. Instead, I think it makes more sense to create
actualhistory tables that derive their definition from the base table. I've got code that extracts information
(column_name,data type, nullability) from a table (or even a table definition), and it's not that complex. With the
workthat's been done on capturing DDL changes it shouldn't be too hard to handle that automatically.

pgsql-general by date:

From: Andy Colson
Date: 06 October 2014, 19:47:21
Subject: Re: How to get good performance for very large lists/sets?

From: Jim Nasby
Date: 06 October 2014, 21:56:43
Subject: Re: faster way to calculate top "tags" for a "resource" based on a column

Re: table versioning approach (not auditing) - Mailing list pgsql-general

Previous

Next