Thread: Online DW
On 6/10/2016 1:11 AM, Sridhar N Bamandlapally wrote: > Is there any feature in PostgreSQL where online DW (Dataware housing) > is possible ? > > am looking for scenario like > > 1. Production DB will have CURRENT + LAST 7 DAYS data only > > 2. Archive/DW DB will have CURRENT + COMPLETE HISTORY > > expecting something like streaming, but not ETL you'd need to manage that yourself. I'd probably use a custom ETL job that pulls data from the production database and sends it to the EDW using a message queue (EMS, JMS, AMQ, etc). the production database would be partitioned by day, and once a day drop the 7+ day old partition, while the EDW wouldn't ever drop, and if its partitioned, maybe by week or month. It would probably be easier if the the EDW didn't need the latest transactions, only data an hour to a day old. If the EDW needed near-live data, then I'd want to be using the same sort of messaging queue platform (EMS, JMS, AMQ, etc) to send the data to the production database, so that the EDW feed process could subscribe to the same events. this sort of pattern works easier if your data is inserted once and not updated, but it can be made to handle updates, too. the message queue paradigm ('publish/subscribe') is a very powerful way of implementing complex distributed systems. -- john r pierce, recycling bits in santa cruz
HiIs there any feature in PostgreSQL where online DW (Dataware housing) is possible ?am looking for scenario like1. Production DB will have CURRENT + LAST 7 DAYS data only2. Archive/DW DB will have CURRENT + COMPLETE HISTORYexpecting something like streaming, but not ETL
On 10 June 2016 at 16:11, Sridhar N Bamandlapally <sridhar.bn1@gmail.com> wrote:HiIs there any feature in PostgreSQL where online DW (Dataware housing) is possible ?am looking for scenario like1. Production DB will have CURRENT + LAST 7 DAYS data only2. Archive/DW DB will have CURRENT + COMPLETE HISTORYexpecting something like streaming, but not ETLThere's nothing built-in, but that's exactly the sort of thing pglogical is intended for. You can also build something along those lines with Londiste fairly easily.Hopefully this is the sort of thing we can move toward with built-in logical replication in coming releases.--
On 6/10/2016 2:18 AM, Sridhar N Bamandlapally wrote: > This/These will be performed in Production to clean-up archive which > will not be sync with Archive/DW DB only > > one heads-up is Archive/DW DB may need to build WITHOUT CONSTRAINTS > > May need to introduce ARCHIVE system/tag in pg_hba.conf there's a whole lot of implied magic here unless you want to get way more specific what these features do, exactly, under all possible conditions. -- john r pierce, recycling bits in santa cruz
On 6/10/2016 2:18 AM, Sridhar N Bamandlapally wrote:This/These will be performed in Production to clean-up archive which will not be sync with Archive/DW DB only
one heads-up is Archive/DW DB may need to build WITHOUT CONSTRAINTS
May need to introduce ARCHIVE system/tag in pg_hba.conf
there's a whole lot of implied magic here unless you want to get way more specific what these features do, exactly, under all possible conditions.
This is what I feel will give me solution to maintain production (current+7days) and archive(current+history) without any etl/scheduler
But there is no feature available in any database
Sridhar
Opentext
On 10 June 2016 at 18:56, John R Pierce <pierce@hogranch.com> wrote:On 6/10/2016 2:18 AM, Sridhar N Bamandlapally wrote:This/These will be performed in Production to clean-up archive which will not be sync with Archive/DW DB only
one heads-up is Archive/DW DB may need to build WITHOUT CONSTRAINTS
May need to introduce ARCHIVE system/tag in pg_hba.conf
there's a whole lot of implied magic here unless you want to get way more specific what these features do, exactly, under all possible conditions.I'd go further than that and say I can't see how something like this could possibly work with physical (block based) replication. It's total hand-waving.--
One thing we can restrict to "begin noarchive" transaction block are DELETE and SELECT only
Sridhar
Opentext
This is what I feel will give me solution to maintain production (current+7days) and archive(current+history) without any etl/scheduler
But there is no feature available in any database
Sridhar
OpentextOn 10 Jun 2016 19:03, "Craig Ringer" <craig@2ndquadrant.com> wrote:On 10 June 2016 at 18:56, John R Pierce <pierce@hogranch.com> wrote:On 6/10/2016 2:18 AM, Sridhar N Bamandlapally wrote:This/These will be performed in Production to clean-up archive which will not be sync with Archive/DW DB only
one heads-up is Archive/DW DB may need to build WITHOUT CONSTRAINTS
May need to introduce ARCHIVE system/tag in pg_hba.conf
there's a whole lot of implied magic here unless you want to get way more specific what these features do, exactly, under all possible conditions.I'd go further than that and say I can't see how something like this could possibly work with physical (block based) replication. It's total hand-waving.--
I may be wrong but ... On Fri, Jun 10, 2016 at 6:33 PM, Sridhar N Bamandlapally <sridhar.bn1@gmail.com> wrote: > One thing we can restrict to "begin noarchive" transaction block are DELETE > and SELECT only > On 10 Jun 2016 21:57, "Sridhar N Bamandlapally" <sridhar.bn1@gmail.com> > wrote: >> This is what I feel will give me solution to maintain production >> (current+7days) and archive(current+history) without any etl/scheduler It seems ( to me ) you have a grossly underspecificied ( in the list ) problem and you have invented an even more underspecified keyword to magically solve it, and expect someone to develop it. And you haven't even bothered to avoid top posting, which is frowned upon on this list and makes infering your problems solution even more difficult. If you want to have something like this you'll need to post much more details on what you are proposing, what are the use cases for the general public, etc.. Just eyeballing it I would estimate this will need many pages just to state the problems and the intended semantics of your proposal. Regards. Francisco Olarte.
I may be wrong but ...
On Fri, Jun 10, 2016 at 6:33 PM, Sridhar N Bamandlapally
<sridhar.bn1@gmail.com> wrote:
> One thing we can restrict to "begin noarchive" transaction block are DELETE
> and SELECT only
> On 10 Jun 2016 21:57, "Sridhar N Bamandlapally" <sridhar.bn1@gmail.com>
> wrote:
>> This is what I feel will give me solution to maintain production
>> (current+7days) and archive(current+history) without any etl/scheduler
It seems ( to me ) you have a grossly underspecificied ( in the list )
problem and you have invented an even more underspecified keyword to
magically solve it, and expect someone to develop it.
And you haven't even bothered to avoid top posting, which is frowned
upon on this list and makes infering your problems solution even more
difficult.
If you want to have something like this you'll need to post much more
details on what you are proposing, what are the use cases for the
general public, etc.. Just eyeballing it I would estimate this will
need many pages just to state the problems and the intended semantics
of your proposal.
Regards.
Francisco Olarte.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Please, you should specify your PostgreSQL version and O/S for questions like this.
--
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

HiIs there any feature in PostgreSQL where online DW (Dataware housing) is possible ?am looking for scenario like1. Production DB will have CURRENT + LAST 7 DAYS data only2. Archive/DW DB will have CURRENT + COMPLETE HISTORYexpecting something like streaming, but not ETL
This is what I feel will give me solution to maintain production (current+7days) and archive(current+history) without any etl/scheduler
But there is no feature available in any database
Sridhar
Opentext
Ok, let me put this way,
I need every transaction coming from application sync with both production and archive db,
but the transactions I do to clean old data(before 7 days) on production db in daily maintenance window should not sync with archive db,
Archive db need read-only, used for maintaining integrity with other business applications
Issue here is,
1. etl is scheduler, cannot run on every transaction, even if it does, its expensive
2. Materialize view(refresh on commit) or slony, will also sync clean-up transactions
3. Replication is not archive, definitely not option
I say, every online archive db is use case for this.
Thanks
Sridhar
Opentext
HiIs there any feature in PostgreSQL where online DW (Dataware housing) is possible ?am looking for scenario like1. Production DB will have CURRENT + LAST 7 DAYS data only2. Archive/DW DB will have CURRENT + COMPLETE HISTORYexpecting something like streaming, but not ETLThe entire DB couldn't operate this way since not every record has a concept of time and if you use any kind of physical time you are going to have issues as well.First impression is you want to horizontally partition your "time-impacted" tables so that each partition contains only data having the same ISO Week number in the same ISO Year.Remove older tables from the inheritance and stick them on a separate tablespace and/or stream them to another database.As has been mentioned there are various tools out there today that can likely be used to fulfill whatever fundamental need you have. "Not ETL" is not a need though, its at best a "nice-to-have" unless you are willing to forgo any solution to your larger problem just because the implementation is not optimal.Unless you define your true goals and constraints its going to be hard to make recommendations.David J.
On Sat, 11 Jun 2016 09:59:59 +0530 Sridhar N Bamandlapally <sridhar.bn1@gmail.com> wrote: > Ok, let me put this way, > > I need every transaction coming from application sync with both > production and archive db, > but the transactions I do to clean old data(before 7 days) on > production db in daily maintenance window should not sync with > archive db, > > Archive db need read-only, used for maintaining integrity with other > business applications > > Issue here is, > 1. etl is scheduler, cannot run on every transaction, even if it > does, its expensive > > 2. Materialize view(refresh on commit) or slony, will also sync > clean-up transactions > > 3. Replication is not archive, definitely not option > > I say, every online archive db is use case for this. As I see, you have 2 options (A, and B) A) With FDW 1) Create 2 databases on production: a) first, a normal database for production, with 7 last days data, b) second, as postgres_fdw remote database on archive server. https://www.postgresql.org/docs/current/static/postgres-fdw.html 2) Make a trigger on each table in production database to replicate inserts, updates and deletes to fdw remote tables. 3) At your scheduled time, truncate tables (don't delete content or the trigger will fire up) on your production db. Note, make date part of your primary keys, if not, the same pk may be reused on production tables and the insert on archive fdw tables will fail. B) Use pgpoolII, use replication to store the same data on both databases. On production db, delete old data. Don't connect both databases with streaming replication nor slony or anything else, pgpoolII will do the work. http://pgpool.net/mediawiki/index.php/Main_Page http://www.pgpool.net/docs/latest/tutorial-en.html#replication-config Read full documentation, above url cites point directly to the replication part but read full documentation. HTH > Thanks > Sridhar > Opentext --- --- Eduardo Morras <emorrasg@yahoo.es>
Ok, let me put this way,
I need every transaction coming from application sync with both production and archive db,
but the transactions I do to clean old data(before 7 days) on production db in daily maintenance window should not sync with archive db,
I say, every online archive db is use case for this.
On Sat, 11 Jun 2016 09:59:59 +0530 Sridhar N Bamandlapally <sridhar.bn1@gmail.com> wrote: > Ok, let me put this way, > > I need every transaction coming from application sync with both > production and archive db, > but the transactions I do to clean old data(before 7 days) on > production db in daily maintenance window should not sync with > archive db, > > Archive db need read-only, used for maintaining integrity with other > business applications > > Issue here is, > 1. etl is scheduler, cannot run on every transaction, even if it > does, its expensive > > 2. Materialize view(refresh on commit) or slony, will also sync > clean-up transactions > > 3. Replication is not archive, definitely not option > > I say, every online archive db is use case for this. As I see, you have 2 options (A, and B) A) With FDW 1) Create 2 databases on production: a) first, a normal database for production, with 7 last days data, b) second, as postgres_fdw remote database on archive server. https://www.postgresql.org/docs/current/static/postgres-fdw.html 2) Make a trigger on each table in production database to replicate inserts, updates and deletes to fdw remote tables. 3) At your scheduled time, truncate tables (don't delete content or the trigger will fire up) on your production db. Note, make date part of your primary keys, if not, the same pk may be reused on production tables and the insert on archive fdw tables will fail. B) Use pgpoolII, use replication to store the same data on both databases. On production db, delete old data. Don't connect both databases with streaming replication nor slony or anything else, pgpoolII will do the work. http://pgpool.net/mediawiki/index.php/Main_Page http://www.pgpool.net/docs/latest/tutorial-en.html#replication-config Read full documentation, above url cites point directly to the replication part but read full documentation. HTH > Thanks > Sridhar > Opentext --- --- Eduardo Morras <emorrasg@yahoo.es>