Re: efficient data reduction (and deduping) - Mailing list pgsql-performance

From Alessandro Gagliardi
Subject Re: efficient data reduction (and deduping)
Date
Msg-id CAAB3BBKS2ncPZROej1OtE_k-DAEsRpJ2JP8w7zhoVJ8+i3EfDA@mail.gmail.com
Whole thread Raw
In response to Re: efficient data reduction (and deduping)  (Peter van Hardenberg <pvh@pvh.ca>)
List pgsql-performance
I was thinking of adding an index, but thought it would be pointless since I would only be using the index once before dropping the table (after its loaded into hourly_activity). I assumed it would take longer to create the index and then use it than to just seq scan once or twice. Am I wrong in that assumption?

On Thu, Mar 1, 2012 at 10:40 AM, Peter van Hardenberg <pvh@pvh.ca> wrote:
On Thu, Mar 1, 2012 at 10:27 AM, Alessandro Gagliardi
<alessandro@path.com> wrote:
> Now, I want to reduce that data to get the last activity that was performed
> by each user in any given hour. It should fit into a table like this:
>

How about:

1) Create an expression based index on date_trunc('hour', hr_timestamp)
2) Create a view on that showing the last value
3) If you want to throw away the data use CREATE TABLE AS on the
results of the view.

You may also want to investigate window functions.

-p

--
Peter van Hardenberg
San Francisco, California
"Everything was beautiful, and nothing hurt." -- Kurt Vonnegut

pgsql-performance by date:

Previous
From: Kääriäinen Anssi
Date:
Subject: Re: Large insert and delete batches
Next
From: Alessandro Gagliardi
Date:
Subject: Re: efficient data reduction (and deduping)