Thread: Populating missing dates in postgresql data
Alright everyone, this is a doozy of a problem. I am new to Postgres so I appreciate patience/understanding. I have a database of hardware objects, each of which has several different “channels”. Once per day, these channels are supposed to check in with a central server, generating an event log table (TABLE A) like the following:
object_id channel check-in date
****************************************
990 1 2014-12-01
990 1 2014-12-02
990 2 2014-12-01
990 2 2014-12-02
286 2 2014-12-01
286 2 2014-12-02
286 5 2014-12-01
286 5 2014-12-02
4507 1 2014-12-01
4507 1 2014-12-02
4507 2 2014-12-01
4507 2 2014-12-02
And so on. Occasionally, communications will break down to the hardware, such that no reporting occurs. For example, let’s say that object 286 loses communications on 12/1/2014. Then the table might look like:
object_id channel check-in date
****************************************
990 1 2014-12-01
990 1 2014-12-02
990 2 2014-12-01
990 2 2014-12-02
286 2 2014-12-02
286 5 2014-12-02
4507 1 2014-12-01
4507 1 2014-12-02
4507 2 2014-12-01
4507 2 2014-12-02
Or let’s say that for some reason, just channel 2 loses reporting for a day. Then we would have:
object_id channel check-in date
****************************************
990 1 2014-12-01
990 1 2014-12-02
990 2 2014-12-01
990 2 2014-12-02
286 2 2014-12-02
286 5 2014-12-01
286 5 2014-12-02
4507 1 2014-12-01
4507 1 2014-12-02
4507 2 2014-12-01
4507 2 2014-12-02
I have a second table (TABLE B) with all of the object_ids and channels that are supposed to be reporting in each day. For cases where a certain channel does not check in, I want to add a column that indicates the comm failure. So, for the example where all channels on object 286 do not check in, I would like to get is something like this:
object_id channel check-in date comm failure
**********************************************************
990 1 2014-12-01 No
990 1 2014-12-02 No
990 2 2014-12-01 No
990 2 2014-12-02 No
286 2 2014-12-01 Yes
286 2 2014-12-02 No
286 5 2014-12-01 Yes
286 5 2014-12-02 No
4507 1 2014-12-01 No
4507 1 2014-12-02 No
4507 2 2014-12-01 No
4507 2 2014-12-02 No
I have been racking my mind for the better part of a day on how to do this. The thing is that I can do a right join of TABLE B on TABLE A, and this will populate the missing object ids and channels. However, this only works for a single day, and it gives me something like:
object_id channel check-in date comm failure
**********************************************************
990 1 2014-12-01 No
990 1 2014-12-02 No
990 2 2014-12-01 No
990 2 2014-12-02 No
286 2 Yes
286 2 2014-12-02 No
286 5 Yes
286 5 2014-12-02 No
4507 1 2014-12-01 No
4507 1 2014-12-02 No
4507 2 2014-12-01 No
4507 2 2014-12-02 No
I need to do a count of comm failures by day, so I need to populate the check-in date field. Please help!
Best Regards,
Steve
*************************************************
Steven Lavrenz, MS, EIT
Doctoral Research Fellow, Ph.D. Candidate
Purdue University | Transportation Engineering
Hampton Hall of Civil Engineering, Room 1122
550 Stadium Mall Drive
West Lafayette, IN 47907
765-775-6423
On 03/25/2015 05:25 PM, Lavrenz, Steven M wrote: > Alright everyone, this is a doozy of a problem. I am new to Postgres so > I appreciate patience/understanding. I have a database of hardware > objects, each of which has several different “channels”. Once per day, > these channels are supposed to check in with a central server, > generating an event log table (TABLE A) like the following: > > object_id channel check-in date > > **************************************** > > 990 1 2014-12-01 > > 990 1 2014-12-02 > > 990 2 2014-12-01 > > 990 2 2014-12-02 > > 286 2 2014-12-01 > > 286 2 2014-12-02 > > 286 5 2014-12-01 > > 286 5 2014-12-02 > > 4507 1 2014-12-01 > > 4507 1 2014-12-02 > > 4507 2 2014-12-01 > > 4507 2 2014-12-02 > > And so on. Occasionally, communications will break down to the hardware, > such that no reporting occurs. For example, let’s say that object 286 > loses communications on 12/1/2014. Then the table might look like: > > object_id channel check-in date > > **************************************** > > 990 1 2014-12-01 > > 990 1 2014-12-02 > > 990 2 2014-12-01 > > 990 2 2014-12-02 > > 286 2 2014-12-02 > > 286 5 2014-12-02 > > 4507 1 2014-12-01 > > 4507 1 2014-12-02 > > 4507 2 2014-12-01 > > 4507 2 2014-12-02 > > Or let’s say that for some reason, just channel 2 loses reporting for a > day. Then we would have: > > object_id channel check-in date > > **************************************** > > 990 1 2014-12-01 > > 990 1 2014-12-02 > > 990 2 2014-12-01 > > 990 2 2014-12-02 > > 286 2 2014-12-02 > > 286 5 2014-12-01 > > 286 5 2014-12-02 > > 4507 1 2014-12-01 > > 4507 1 2014-12-02 > > 4507 2 2014-12-01 > > 4507 2 2014-12-02 > > I have a second table (TABLE B) with all of the object_ids and channels > that are supposed to be reporting in each day. For cases where a certain > channel does not check in, I want to add a column that indicates the > comm failure. So, for the example where all channels on object 286 do > not check in, I would like to get is something like this: > > object_id channel check-in > date comm failure > > ********************************************************** > > 990 1 > 2014-12-01 No > > 990 1 > 2014-12-02 No > > 990 2 > 2014-12-01 No > > 990 2 > 2014-12-02 No > > 286 2 > 2014-12-01 Yes > > 286 2 > 2014-12-02 No > > 286 5 > 2014-12-01 Yes > > 286 5 > 2014-12-02 No > > 4507 1 > 2014-12-01 No > > 4507 1 > 2014-12-02 No > > 4507 2 > 2014-12-01 No > > 4507 2 > 2014-12-02 No > > I have been racking my mind for the better part of a day on how to do > this. The thing is that I can do a right join of TABLE B on TABLE A, and > this will populate the missing object ids and channels. However, this > only works for a single day, and it gives me something like: > > object_id channel check-in > date comm failure > > ********************************************************** > > 990 1 > 2014-12-01 No > > 990 1 > 2014-12-02 No > > 990 2 > 2014-12-01 No > > 990 2 > 2014-12-02 No > > 286 2 > Yes > > 286 2 > 2014-12-02 No > > 286 5 > Yes > > 286 5 > 2014-12-02 No > > 4507 1 > 2014-12-01 No > > 4507 1 > 2014-12-02 No > > 4507 2 > 2014-12-01 No > > 4507 2 > 2014-12-02 No > > I need to do a count of comm failures by day, so I need to populate the > check-in date field. Please help! > Without seeing the actual query this is just a suggestion. I would say use CASE: http://www.postgresql.org/docs/9.3/interactive/functions-conditional.html#FUNCTIONS-CASE Where if the date was not available from table A use the one from table B. > Best Regards, > > Steve > -- Adrian Klaver adrian.klaver@aklaver.com
Alright everyone, this is a doozy of a problem. I am new to Postgres so I appreciate patience/understanding. I have a database of hardware objects, each of which has several different “channels”. Once per day, these channels are supposed to check in with a central server, generating an event log table (TABLE A) like the following:
object_id channel check-in date
****************************************
990 1 2014-12-01
990 1 2014-12-02
990 2 2014-12-01
990 2 2014-12-02
286 2 2014-12-01
286 2 2014-12-02
286 5 2014-12-01
286 5 2014-12-02
4507 1 2014-12-01
4507 1 2014-12-02
4507 2 2014-12-01
4507 2 2014-12-02
And so on. Occasionally, communications will break down to the hardware, such that no reporting occurs. For example, let’s say that object 286 loses communications on 12/1/2014. Then the table might look like:
object_id channel check-in date
****************************************
990 1 2014-12-01
990 1 2014-12-02
990 2 2014-12-01
990 2 2014-12-02
286 2 2014-12-02
286 5 2014-12-02
4507 1 2014-12-01
4507 1 2014-12-02
4507 2 2014-12-01
4507 2 2014-12-02
Or let’s say that for some reason, just channel 2 loses reporting for a day. Then we would have:
object_id channel check-in date
****************************************
990 1 2014-12-01
990 1 2014-12-02
990 2 2014-12-01
990 2 2014-12-02
286 2 2014-12-02
286 5 2014-12-01
286 5 2014-12-02
4507 1 2014-12-01
4507 1 2014-12-02
4507 2 2014-12-01
4507 2 2014-12-02
I have a second table (TABLE B) with all of the object_ids and channels that are supposed to be reporting in each day. For cases where a certain channel does not check in, I want to add a column that indicates the comm failure. So, for the example where all channels on object 286 do not check in, I would like to get is something like this:
object_id channel check-in date comm failure
**********************************************************
990 1 2014-12-01 No
990 1 2014-12-02 No
990 2 2014-12-01 No
990 2 2014-12-02 No
286 2 2014-12-01 Yes
286 2 2014-12-02 No
286 5 2014-12-01 Yes
286 5 2014-12-02 No
4507 1 2014-12-01 No
4507 1 2014-12-02 No
4507 2 2014-12-01 No
4507 2 2014-12-02 No
I have been racking my mind for the better part of a day on how to do this. The thing is that I can do a right join of TABLE B on TABLE A, and this will populate the missing object ids and channels. However, this only works for a single day, and it gives me something like:
object_id channel check-in date comm failure
**********************************************************
990 1 2014-12-01 No
990 1 2014-12-02 No
990 2 2014-12-01 No
990 2 2014-12-02 No
286 2 Yes
286 2 2014-12-02 No
286 5 Yes
286 5 2014-12-02 No
4507 1 2014-12-01 No
4507 1 2014-12-02 No
4507 2 2014-12-01 No
4507 2 2014-12-02 No
I need to do a count of comm failures by day, so I need to populate the check-in date field. Please help!
Hi,
We have a customer complaining about the time taken by one of the application scripts while deleting older data from the log tables.
During the deletion, customer reported that he often sees the below error and because of which table size doesn’t reduce.
ERROR: canceling autovacuum task
Date: 2015-03-14 04:29:19
Context: automatic analyze of table "fm_db_Server3.mmsuper.audittraillogentry"
We have the following queries in this regard:
- How often is the autovacuum task invoked by postgres
- If the task gets cancelled (while we were deleting data from the database) would this task be re-invoked at a later time?
- If insertion of data into a table also impact this task?
- If we can manually schedule this task to a particular time (like off peak hours)?
Regards
Mitu
Hi,
We have a customer complaining about the time taken by one of the application scripts while deleting older data from the log tables.
During the deletion, customer reported that he often sees the below error and because of which table size doesn’t reduce.
ERROR: canceling autovacuum task
Date: 2015-03-14 04:29:19
Context: automatic analyze of table "fm_db_Server3.mmsuper.audittraillogentry"
We have the following queries in this regard:
- How often is the autovacuum task invoked by postgres
- If the task gets cancelled (while we were deleting data from the database) would this task be re-invoked at a later time?
- If insertion of data into a table also impact this task?
- If we can manually schedule this task to a particular time (like off peak hours)?
> On 26 Mar 2015, at 1:25, Lavrenz, Steven M <slavrenz@purdue.edu> wrote: > > Alright everyone, this is a doozy of a problem. I am new to Postgres so I appreciate patience/understanding. I have a databaseof hardware objects, each of which has several different “channels”. Once per day, these channels are supposed > > > > object_id channel check-in date > > **************************************** > > 990 1 2014-12-01 > > 990 1 2014-12-02 > > 990 2 2014-12-01 > > 990 2 2014-12-02 > > 286 2 2014-12-01 > > 286 2 2014-12-02 > > 286 5 2014-12-01 > > 286 5 2014-12-02 > > 4507 1 2014-12-01 > > 4507 1 2014-12-02 > > 4507 2 2014-12-01 > > 4507 2 2014-12-02 > > > > And so on. Occasionally, communications will break down to the hardware, such that no reporting occurs. object 286 losescommunications on 12/1/2014. Then the table might look like: > > > > object_id channel check-in date > > **************************************** > > 990 1 2014-12-01 > > 990 1 2014-12-02 > > 990 2 2014-12-01 > > 990 2 2014-12-02 > > > > 286 2 2014-12-02 > > > > 286 5 2014-12-02 > > 4507 1 2014-12-01 > > 4507 1 2014-12-02 > > 4507 2 2014-12-01 > > 4507 2 2014-12-02 > > > > Or let’s say that for some reason, just channel 2 loses reporting for a day. Then we would have: > > > > object_id channel check-in date > > **************************************** > > 990 1 2014-12-01 > > 990 1 2014-12-02 > > 990 2 2014-12-01 > > 990 2 2014-12-02 > > > > 286 2 2014-12-02 > > 286 5 2014-12-01 > > 286 5 2014-12-02 > > 4507 1 2014-12-01 > > 4507 1 2014-12-02 > > 4507 2 2014-12-01 > > 4507 2 2014-12-02 > > > > I have a second table (TABLE B) with all of the object_ids and channels that are supposed to be reporting in each day.For cases where a certain channel does not check in, I want to add a column that indicates the comm failure. So, for > > > > object_id channel check-in date comm failure > > ********************************************************** > > 990 1 2014-12-01 No > > 990 1 2014-12-02 No > > 990 2 2014-12-01 No > > 990 2 2014-12-02 No > > 286 2 2014-12-01 Yes > > 286 2 2014-12-02 No > > 286 5 2014-12-01 Yes > > 286 5 2014-12-02 No > > 4507 1 2014-12-01 No > > 4507 1 2014-12-02 No > > 4507 2 2014-12-01 No > > 4507 2 2014-12-02 No > > > > > > I have been racking my mind for the better part of a day on how to do this. The thing is that I can do a right join ofTABLE B on TABLE A, and this will populate the missing object ids and channels. However, this only works for a single > > > > object_id channel check-in date comm failure > > ********************************************************** > > 990 1 2014-12-01 No > > 990 1 2014-12-02 No > > 990 2 2014-12-01 No > > 990 2 2014-12-02 No > > 286 2 Yes > > 286 2 2014-12-02 No > > 286 5 Yes > > 286 5 2014-12-02 No > > 4507 1 2014-12-01 No > > 4507 1 2014-12-02 No > > 4507 2 2014-12-01 No > > 4507 2 2014-12-02 No > > > > I need to do a count of comm failures by day, so I need to populate the check-in date field. Please help! Easiest would be to insert the missing values in your table, something like: WITH RECURSIVE calendar (missing_date) AS ( SELECT MAX(check_in_date) FROM table_a WHERE comm_failure = 'YES' UNION ALL SELECT missing_date + interval '1 day' FROM calendar WHERE missing_date < CURRENT_DATE ) INSERT INTO table_a (object_id, channel, check_in_date, comm_failure) SELECT b.object_id, b.channel. c.missing_date, 'YES' FROM table_b b, calendar c WHERE NOT EXISTS ( SELECT 1 FROM table_a a WHERE a.object_id = b.object_id AND a.channel = b.channel AND a.check_in_date = c.missing_date ); That's off the top of my head, untested, etc, but I think I got that mostly right. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
On Thu, 26 Mar 2015 00:25:09 +0000 "Lavrenz, Steven M" <slavrenz@purdue.edu> wrote: > I have a second table (TABLE B) with all of the object_ids and channels that are supposed to be reporting in each day.For cases where a certain channel does not check in, I want to add a column that indicates the comm failure. Not sure if your context allows it, but if you can change your program's logic, it might be easier to : -add a boolean field (e.g. 'checked') to table B, set to false -whenever an object checks in do 'update table B set checked = true where object_id = X and channel = Y' -run a cron job once a day that -runs 'select * from B where checked = false' and stores results somewhere -resets B with 'update B set checked = false' -- Salutations, Vincent Veyron https://libremen.com/ Legal case, contract and insurance claim management software
On Thu, 26 Mar 2015 00:25:09 +0000
"Lavrenz, Steven M" <slavrenz@purdue.edu> wrote:
> I have a second table (TABLE B) with all of the object_ids and channels that are supposed to be reporting in each day. For cases where a certain channel does not check in, I want to add a column that indicates the comm failure.
Not sure if your context allows it, but if you can change your program's logic, it might be easier to :
-add a boolean field (e.g. 'checked') to table B, set to false
-whenever an object checks in do 'update table B set checked = true where object_id = X and channel = Y'
-run a cron job once a day that
-runs 'select * from B where checked = false' and stores results somewhere
-resets B with 'update B set checked = false'
On Fri, 27 Mar 2015 08:33:36 -0700 "David G. Johnston" <david.g.johnston@gmail.com> wrote: > This is a performance optimization that I would avoid at nearly any cost, > and there are likely better ways to limit the processing scope without > having to trust the a cron job runs daily mmmhh... it's not so much performance optimization as query simplification, but I agree it adds a maintenance problem. I never heard of problems with cron not running though, is this really a concern? -- Salutations, Vincent Veyron https://legalcase.libremen.com/ Legal case, contract and insurance claim management software