Thread: Beginner Question...
Hi everyone, Could someone help me with this question please? I have a table with four columns: - gps_id (primary key) - date_time - crimes_link (foreign key) - osgb36_geom The data in the table involves lots of gps tracks of different journeys. They are grouped into journeys by the 'crimes_link' field. So for example the first 50 rows have an identical 'crimes_link' field, then the next 50 rows a different value in 'crimes_link', then the next 50 rows a different one, etc. What I would like to do is to select the beginning location of each journey. This query gives me the date_time of the beginning of the journey: SELECT crimes_link, MIN(date_time) FROM camdengps3 GROUP BY crimes_link; However I need to add the osgb36_geom column into the query and am unable too. Any ideas how to do this please? Thank you James Smith
On 9 July 2011 18:41, James David Smith <james.david.smith@gmail.com> wrote: > Hi everyone, > > Could someone help me with this question please? I have a table with > four columns: > > - gps_id (primary key) > - date_time > - crimes_link (foreign key) > - osgb36_geom > > The data in the table involves lots of gps tracks of different > journeys. They are grouped into journeys by the 'crimes_link' field. > So for example the first 50 rows have an identical 'crimes_link' > field, then the next 50 rows a different value in 'crimes_link', then > the next 50 rows a different one, etc. What I would like to do is to > select the beginning location of each journey. This query gives me the > date_time of the beginning of the journey: > > SELECT crimes_link, MIN(date_time) > FROM camdengps3 > GROUP BY crimes_link; > > However I need to add the osgb36_geom column into the query and am unable too. > > Any ideas how to do this please? Sorry, I don't have time to test this now and maybe someone else has a better way, but maybe it will give you and idea: SELECT l.date_time, l.crimes_link, l.osgb36_geom FROM camdengps3 AS l INNER JOIN ( SELECT crimes_link, MIN(date_time) AS start FROM camdengps3 GROUP BY crimes_link) AS r ON l.date_time = s.start AND l.crimes_link = s.crimes_link; -- Michael Wood <esiotrot@gmail.com>
James David Smith <james.david.smith@gmail.com> writes: > ... What I would like to do is to > select the beginning location of each journey. This query gives me the > date_time of the beginning of the journey: > SELECT crimes_link, MIN(date_time) > FROM camdengps3 > GROUP BY crimes_link; > However I need to add the osgb36_geom column into the query and am unable too. You could do it with SELECT DISTINCT ON; see the "weather reports" example in the SELECT reference page in the PG manual. The more SQL-standard way is to use a subselect, viz SELECT whatever FROM camdengps3 upper WHERE date_time = (SELECT MIN(date_time) FROM camdengps3 lower WHERE lower.crimes_link = upper.crimes_link); However this is generally a lot slower, and it also outputs multiple rows if there are multiple rows meeting the MIN date_time in any particular group, which might not be what you want. regards, tom lane
Dear Michael, Tom ( et al), Thanks for your help. Once I had edited Michael's code slightlyit worked perfectly! Thank you! I had just this minute managed to achieve a similar result, but by using two temp tables and about 10 more lines of code..! Haha. Unfortunately I am presented with 3425 rows of starting locations, when I know that there are only 2233 journeys in the table. It seems that some of the journeys have more than one starting point according to the GPS. Bad data. I think what I now need to do are to find the journeys that have more than one starting location, and to take some sort of average of their GEOM column. This code identifies how many starting points there are for each journey: ________________________ DROP TABLE IF EXISTS temp; SELECT l.camdencrimes_link, l.osgb36_geom, date_time INTO TEMP TABLE temp FROM camdengps3 AS l INNER JOIN ( SELECT camdencrimes_link, MIN(date_time) AS start FROM camdengps3 GROUP BY camdencrimes_link) AS r ON l.date_time = r.start AND l.camdencrimes_link = r.camdencrimes_link; SELECT a.*, b FROM temp AS a INNER JOIN ( SELECT camdencrimes_link, COUNT(camdencrimes_link) AS b FROM temp GROUP BY camdencrimes_link) AS c ON a.camdencrimes_link = c.camdencrimes_link WHERE b >1; ________________________ But I now need to take an average of the starting points for each journey where there are more than one starting point? Any ideas guys? Thank you so much for your help... James On 9 July 2011 18:50, Michael Wood <esiotrot@gmail.com> wrote: > On 9 July 2011 18:41, James David Smith <james.david.smith@gmail.com> wrote: >> Hi everyone, >> >> Could someone help me with this question please? I have a table with >> four columns: >> >> - gps_id (primary key) >> - date_time >> - crimes_link (foreign key) >> - osgb36_geom >> >> The data in the table involves lots of gps tracks of different >> journeys. They are grouped into journeys by the 'crimes_link' field. >> So for example the first 50 rows have an identical 'crimes_link' >> field, then the next 50 rows a different value in 'crimes_link', then >> the next 50 rows a different one, etc. What I would like to do is to >> select the beginning location of each journey. This query gives me the >> date_time of the beginning of the journey: >> >> SELECT crimes_link, MIN(date_time) >> FROM camdengps3 >> GROUP BY crimes_link; >> >> However I need to add the osgb36_geom column into the query and am unable too. >> >> Any ideas how to do this please? > > Sorry, I don't have time to test this now and maybe someone else has a > better way, but maybe it will give you and idea: > > SELECT l.date_time, l.crimes_link, l.osgb36_geom > FROM camdengps3 AS l INNER JOIN ( > SELECT crimes_link, MIN(date_time) AS start > FROM camdengps3 > GROUP BY crimes_link) AS r > ON l.date_time = s.start AND l.crimes_link = s.crimes_link; > > -- > Michael Wood <esiotrot@gmail.com> >
On 10/07/11 04:41, James David Smith wrote:
I would recommend adding an index, as follows:
CREATE INDEX ON location(crimes_link, date_time);
I tested the followiunmg query, I and observed that the above index improves performance. I used pg 9.1beta2, but this should also work for any pg veresion that imoplements the 'WITH' clause on 'SELECT'.
WITH
start (crimes_link, date_time) AS
(
SELECT crimes_link, min(date_time)
FROM location
GROUP BY crimes_link
)
SELECT
l.gps_id,
l.date_time,
l.crimes_link,
l.osgb36_geom
FROM
location l,
start s
WHERE
l.crimes_link = s.crimes_link AND
l.date_time = s.date_time
ORDER BY
l.gps_id;
Cheers,
Gavin
Hi ,- gps_id (primary key) - date_time - crimes_link (foreign key) - osgb36_geom The data in the table involves lots of gps tracks of different journeys. They are grouped into journeys by the 'crimes_link' field. So for example the first 50 rows have an identical 'crimes_link' field, then the next 50 rows a different value in 'crimes_link', then the next 50 rows a different one, etc. What I would like to do is to select the beginning location of each journey. This query gives me the date_time of the beginning of the journey: SELECT crimes_link, MIN(date_time) FROM camdengps3 GROUP BY crimes_link; However I need to add the osgb36_geom column into the query and am unable too. Any ideas how to do this please? Thank you James Smith
I would recommend adding an index, as follows:
CREATE INDEX ON location(crimes_link, date_time);
I tested the followiunmg query, I and observed that the above index improves performance. I used pg 9.1beta2, but this should also work for any pg veresion that imoplements the 'WITH' clause on 'SELECT'.
WITH
start (crimes_link, date_time) AS
(
SELECT crimes_link, min(date_time)
FROM location
GROUP BY crimes_link
)
SELECT
l.gps_id,
l.date_time,
l.crimes_link,
l.osgb36_geom
FROM
location l,
start s
WHERE
l.crimes_link = s.crimes_link AND
l.date_time = s.date_time
ORDER BY
l.gps_id;
Cheers,
Gavin