Thread: Beginner Question...

Beginner Question...

From

James David Smith

Date:

09 July 2011, 13:42:03

Hi everyone,

Could someone help me with this question please? I have a table with
four columns:

- gps_id (primary key)
- date_time
- crimes_link (foreign key)
- osgb36_geom

The data in the table involves lots of gps tracks of different
journeys. They are grouped into journeys by the 'crimes_link' field.
So for example the first 50 rows have an identical 'crimes_link'
field, then the next 50 rows a different value in 'crimes_link', then
the next 50 rows a different one, etc. What I would like to do is to
select the beginning location of each journey. This query gives me the
date_time of the beginning of the journey:

SELECT crimes_link, MIN(date_time)
FROM camdengps3
GROUP BY crimes_link;

However I need to add the osgb36_geom column into the query and am unable too.

Any ideas how to do this please?

Thank you

James Smith

Re: Beginner Question...

From

Michael Wood

Date:

09 July 2011, 14:50:40

On 9 July 2011 18:41, James David Smith <james.david.smith@gmail.com> wrote:
> Hi everyone,
>
> Could someone help me with this question please? I have a table with
> four columns:
>
> - gps_id (primary key)
> - date_time
> - crimes_link (foreign key)
> - osgb36_geom
>
> The data in the table involves lots of gps tracks of different
> journeys. They are grouped into journeys by the 'crimes_link' field.
> So for example the first 50 rows have an identical 'crimes_link'
> field, then the next 50 rows a different value in 'crimes_link', then
> the next 50 rows a different one, etc. What I would like to do is to
> select the beginning location of each journey. This query gives me the
> date_time of the beginning of the journey:
>
> SELECT crimes_link, MIN(date_time)
> FROM camdengps3
> GROUP BY crimes_link;
>
> However I need to add the osgb36_geom column into the query and am unable too.
>
> Any ideas how to do this please?

Sorry, I don't have time to test this now and maybe someone else has a
better way, but maybe it will give you and idea:

SELECT l.date_time, l.crimes_link, l.osgb36_geom
FROM camdengps3 AS l INNER JOIN (
    SELECT crimes_link, MIN(date_time) AS start
    FROM camdengps3
    GROUP BY crimes_link) AS r
        ON l.date_time = s.start AND l.crimes_link = s.crimes_link;

--
Michael Wood <esiotrot@gmail.com>

Re: Beginner Question...

From

Tom Lane

Date:

09 July 2011, 14:53:56

James David Smith <james.david.smith@gmail.com> writes:
> ... What I would like to do is to
> select the beginning location of each journey. This query gives me the
> date_time of the beginning of the journey:

> SELECT crimes_link, MIN(date_time)
> FROM camdengps3
> GROUP BY crimes_link;

> However I need to add the osgb36_geom column into the query and am unable too.

You could do it with SELECT DISTINCT ON; see the "weather reports"
example in the SELECT reference page in the PG manual.

The more SQL-standard way is to use a subselect, viz

SELECT whatever
FROM camdengps3 upper
WHERE date_time = (SELECT MIN(date_time) FROM camdengps3 lower
                   WHERE lower.crimes_link = upper.crimes_link);

However this is generally a lot slower, and it also outputs
multiple rows if there are multiple rows meeting the MIN date_time
in any particular group, which might not be what you want.

            regards, tom lane

Re: Beginner Question...

From

James David Smith

Date:

09 July 2011, 19:31:45

Dear Michael, Tom ( et al),

Thanks for your help. Once I had edited Michael's code slightlyit
worked perfectly! Thank you! I had just this minute managed to achieve
a similar result, but by using two temp tables and about 10 more lines
of code..! Haha.

Unfortunately I am presented with 3425 rows of starting locations,
when I know that there are only 2233 journeys in the table. It seems
that some of the journeys have more than one starting point according
to the GPS. Bad data. I think what I now need to do are to find the
journeys that have more than one starting location, and to take some
sort of average of their GEOM column. This code identifies how many
starting points there are for each journey:

________________________
DROP TABLE IF EXISTS temp;

SELECT l.camdencrimes_link, l.osgb36_geom, date_time INTO TEMP TABLE temp
FROM camdengps3 AS l INNER JOIN (
   SELECT camdencrimes_link, MIN(date_time) AS start
   FROM camdengps3
   GROUP BY camdencrimes_link) AS r
       ON l.date_time = r.start AND l.camdencrimes_link = r.camdencrimes_link;

SELECT a.*, b
FROM temp AS a
INNER JOIN
( SELECT  camdencrimes_link, COUNT(camdencrimes_link) AS b
FROM temp
GROUP BY camdencrimes_link) AS c
ON a.camdencrimes_link = c.camdencrimes_link
WHERE b >1;
________________________

But I now need to take an average of the starting points for each
journey where there are more than one starting point? Any ideas guys?

Thank you so much for your help...

James






On 9 July 2011 18:50, Michael Wood <esiotrot@gmail.com> wrote:
> On 9 July 2011 18:41, James David Smith <james.david.smith@gmail.com> wrote:
>> Hi everyone,
>>
>> Could someone help me with this question please? I have a table with
>> four columns:
>>
>> - gps_id (primary key)
>> - date_time
>> - crimes_link (foreign key)
>> - osgb36_geom
>>
>> The data in the table involves lots of gps tracks of different
>> journeys. They are grouped into journeys by the 'crimes_link' field.
>> So for example the first 50 rows have an identical 'crimes_link'
>> field, then the next 50 rows a different value in 'crimes_link', then
>> the next 50 rows a different one, etc. What I would like to do is to
>> select the beginning location of each journey. This query gives me the
>> date_time of the beginning of the journey:
>>
>> SELECT crimes_link, MIN(date_time)
>> FROM camdengps3
>> GROUP BY crimes_link;
>>
>> However I need to add the osgb36_geom column into the query and am unable too.
>>
>> Any ideas how to do this please?
>
> Sorry, I don't have time to test this now and maybe someone else has a
> better way, but maybe it will give you and idea:
>
> SELECT l.date_time, l.crimes_link, l.osgb36_geom
> FROM camdengps3 AS l INNER JOIN (
>    SELECT crimes_link, MIN(date_time) AS start
>    FROM camdengps3
>    GROUP BY crimes_link) AS r
>        ON l.date_time = s.start AND l.crimes_link = s.crimes_link;
>
> --
> Michael Wood <esiotrot@gmail.com>
>

Re: Beginner Question...

From

Gavin Flower

Date:

10 July 2011, 05:12:16

On 10/07/11 04:41, James David Smith wrote:

- gps_id (primary key)
- date_time
- crimes_link (foreign key)
- osgb36_geom

The data in the table involves lots of gps tracks of different
journeys. They are grouped into journeys by the 'crimes_link' field.
So for example the first 50 rows have an identical 'crimes_link'
field, then the next 50 rows a different value in 'crimes_link', then
the next 50 rows a different one, etc. What I would like to do is to
select the beginning location of each journey. This query gives me the
date_time of the beginning of the journey:

SELECT crimes_link, MIN(date_time)
FROM camdengps3
GROUP BY crimes_link;

However I need to add the osgb36_geom column into the query and am unable too.

Any ideas how to do this please?

Thank you

James Smith

Hi ,

I would recommend adding an index, as follows:

CREATE INDEX ON location(crimes_link, date_time);

I tested the followiunmg query, I and observed that the above index improves performance. I used pg 9.1beta2, but this should also work for any pg veresion that imoplements the 'WITH' clause on 'SELECT'.

WITH
    start (crimes_link, date_time) AS
    (
        SELECT crimes_link, min(date_time)
        FROM location
        GROUP BY crimes_link
    )
SELECT
    l.gps_id,
    l.date_time,
    l.crimes_link,
    l.osgb36_geom
FROM
    location l,
    start    s
WHERE
    l.crimes_link = s.crimes_link AND
    l.date_time = s.date_time
ORDER BY
    l.gps_id;


Cheers,
Gavin