Thread: Approach to extract top records from table based upon aggregate

Approach to extract top records from table based upon aggregate

From

droberts

Date:

02 November 2015, 22:14:19

Hi, I have a table that contains call records.  I'm looking to get only
records for users who made the most calls over a particular time duration in
an efficient way.

calls()

time,  duration,   caller_number, dialed_number



-- query to get top 10 callers
  select caller_number, count(1) from calls group by caller_number order by
calls desc limit 10

--my current query to get those callers

select * from call where caller_number in (above query)


It works but I was hoping for something a little more efficient if anyone
has an idea.

Tahnks



--
View this message in context:
http://postgresql.nabble.com/Approach-to-extract-top-records-from-table-based-upon-aggregate-tp5872427.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

Re: Approach to extract top records from table based upon aggregate

From

"David G. Johnston"

Date:

02 November 2015, 22:22:35

On Mon, Nov 2, 2015 at 3:14 PM, droberts <david.roberts@riverbed.com> wrote:

Hi, I have a table that contains call records. I'm looking to get only
records for users who made the most calls over a particular time duration in
an efficient way.

calls()

time, duration, caller_number, dialed_number

-- query to get top 10 callers
select caller_number, count(1) from calls group by caller_number order by
calls desc limit 10

--my current query to get those callers

select * from call where caller_number in (above query)

It works but I was hoping for something a little more efficient if anyone
has an idea.

I don't think there is anything that is "a little more efficient" (implying, only a bit harder to implement).

You can probably get significantly faster by combining various forms of pre-computation and caching. It is likewise significantly more complex to implement.

David J.

Re: Approach to extract top records from table based upon aggregate

From

Emanuel Calvo

Date:

03 November 2015, 01:00:25

2015-11-02 19:14 GMT-03:00 droberts <david.roberts@riverbed.com>:

Hi, I have a table that contains call records. I'm looking to get only
records for users who made the most calls over a particular time duration in
an efficient way.

calls()

time, duration, caller_number, dialed_number

-- query to get top 10 callers
select caller_number, count(1) from calls group by caller_number order by
calls desc limit 10

--my current query to get those callers

select * from call where caller_number in (above query)

It works but I was hoping for something a little more efficient if anyone
has an idea.

I think that almost every time based tables, should be partitioned. Also, depending on your workload you can create lazy views over the last entries in calls table during a particular time frame.

Probably in this particular case, you will want to dig into more underneath design in order to get the best performance.

Doing a lazy view with that query, you can use the top n of it and get less callers if you need to (or more if you want to expand the feature).

Hope it helps,

--

Emanuel Calvo

Twitter: 3manuek
LinkedIn: http://ar.linkedin.com/in/ecbcbcb/

Re: Approach to extract top records from table based upon aggregate

From

Merlin Moncure

Date:

03 November 2015, 16:33:05

On Mon, Nov 2, 2015 at 4:14 PM, droberts <david.roberts@riverbed.com> wrote:
> Hi, I have a table that contains call records.  I'm looking to get only
> records for users who made the most calls over a particular time duration in
> an efficient way.
>
> calls()
>
> time,  duration,   caller_number, dialed_number
>
>
>
> -- query to get top 10 callers
>   select caller_number, count(1) from calls group by caller_number order by
> calls desc limit 10
>
> --my current query to get those callers
>
> select * from call where caller_number in (above query)
>
>
> It works but I was hoping for something a little more efficient if anyone
> has an idea.

How fast is it running, and how fast do you expect it to run?  To make
that faster than that, you're going to have to rethink things a little
bit.  For example, you could narrow the search down to a time range,
or maybe you could keep a running internalization of the count.

This query looks suspicious:
select caller_number, count(1) from calls group by caller_number order
by calls desc limit 10

you're ordering by the entire table, which is almost certainly a
mistake.  It probably needs to look like:

select *
from
(
  select
    caller_number,
    count(1) as count_calls
  from calls
  group by caller_number
) q order by count_calls desc limit 10;

merlin