Thread: Merging lines with NULLs (with example data)
Good afternoon, I have a simple problem, and I feel stupid not finding myself what's the solution... I try to explain shortly, but as I'm not really confident with my explanation, I provided a complete example with data below. How can I "merge" this gday,count_udp,count_tcp '2005-10-20','','2' '2005-10-20','3','' '2005-10-21','','1' '2005-10-21','5','' into that: gday,count_udp,count_tcp '2005-10-20','3','2' '2005-10-21','5','1' in a single query??? Thanks in advance, MaXX Here's all the details: I have a table CREATE TABLE test ( id serial NOT NULL, tstamp timestamptz, host varchar(80), rulenr int4, act varchar(10), proto varchar(4), src_ip inet,src_port int4, dst_ip inet, dst_port int4, dir varchar(3), if varchar(5), reported bool, protected bool, CONSTRAINTpk_ipfw_id PRIMARY KEY (id) ) WITH OIDS; Data: INSERT INTO test VALUES (453639,'2005-10-21 09:39:19+02','akar',600,'Deny','UDP','10.182.144.167',52616,'10.0.0.1',2290,'in','rl0',TRUE,NULL); INSERT INTO test VALUES (453634,'2005-10-21 09:36:21+02','akar',600,'Deny','UDP','10.182.144.167',52616,'10.0.0.1',2290,'in','rl0',TRUE,NULL); INSERT INTO test VALUES (453633,'2005-10-21 09:36:20+02','akar',600,'Deny','UDP','10.182.144.167',52616,'10.0.0.1',2290,'in','rl0',TRUE,NULL); INSERT INTO test VALUES (452610,'2005-10-21 03:33:28+02','akar',600,'Deny','TCP','10.182.174.7',4310,'10.0.0.1',2290,'in','rl0',TRUE,NULL); INSERT INTO test VALUES (451735,'2005-10-21 00:11:52+02','akar',600,'Deny','UDP','10.216.48.231',6778,'10.0.0.1',2290,'in','rl0',TRUE,NULL); INSERT INTO test VALUES (451734,'2005-10-21 00:11:50+02','akar',600,'Deny','UDP','10.216.48.231',6778,'10.0.0.1',2290,'in','rl0',TRUE,NULL); INSERT INTO test VALUES (448277,'2005-10-20 16:31:17+02','akar',600,'Deny','UDP','10.244.165.152',31262,'10.0.0.1',2290,'in','rl0',TRUE,NULL); INSERT INTO test VALUES (448276,'2005-10-20 16:31:15+02','akar',600,'Deny','UDP','10.244.165.152',31262,'10.0.0.1',2290,'in','rl0',TRUE,NULL); INSERT INTO test VALUES (448266,'2005-10-20 16:29:08+02','akar',600,'Deny','TCP','10.224.32.85',63891,'10.0.0.1',2290,'in','rl0',TRUE,NULL); INSERT INTO test VALUES (448265,'2005-10-20 16:29:05+02','akar',600,'Deny','TCP','10.224.32.85',63891,'10.0.0.1',2290,'in','rl0',TRUE,NULL); INSERT INTO test VALUES (448258,'2005-10-20 16:28:16+02','akar',600,'Deny','UDP','10.244.165.152',31262,'10.0.0.1',2290,'in','rl0',TRUE,NULL); when I execute a query like this, select to_date(tstamp,'YYYY-MM-DD')as gday, case when proto='UDP' then count(id) elseNULL end as count_udp, case when proto='TCP' then count(id) else NULL end as count_tcp from test where tstamp >= (now() - interval '$days days' ) and dst_port = $port group by gday, proto order by gday; I get: gday,count_udp,count_tcp '2005-10-20','','2' '2005-10-20','3','' '2005-10-21','','1' '2005-10-21','5','' This is not what I want, I want that: gday,count_udp,count_tcp '2005-10-20','3','2' '2005-10-21','5','1' -- MaXX
In article <djdp5l$1l4f$1@talisker.lacave.net>, MaXX <bs139412@skynet.be> writes: > How can I "merge" this > gday,count_udp,count_tcp > '2005-10-20','','2' > '2005-10-20','3','' > '2005-10-21','','1' > '2005-10-21','5','' > into that: > gday,count_udp,count_tcp > '2005-10-20','3','2' > '2005-10-21','5','1' > in a single query??? Try something like that: SELECT to_date (tstamp,'YYYY-MM-DD') AS gday, sum(CASE WHEN proto = 'UDP' THEN 1 ELSE 0 END) AS count_udp, sum(CASE WHEN proto = 'TCP' THEN 1 ELSE 0 END) AS count_tcp FROM test WHERE tstamp >= now() - INTERVAL '$days DAYS' ANDdst_port = $port GROUP BY gday ORDER BY gday
Harald Fuchs wrote: > In article <djdp5l$1l4f$1@talisker.lacave.net>, > MaXX <bs139412@skynet.be> writes: > > >>How can I "merge" this >>gday,count_udp,count_tcp >>'2005-10-20','','2' >>'2005-10-20','3','' >>'2005-10-21','','1' >>'2005-10-21','5','' > > >>into that: >>gday,count_udp,count_tcp >>'2005-10-20','3','2' >>'2005-10-21','5','1' > > >>in a single query??? > > > Try something like that: > > SELECT to_date (tstamp,'YYYY-MM-DD') AS gday, > sum(CASE WHEN proto = 'UDP' THEN 1 ELSE 0 END) AS count_udp, > sum(CASE WHEN proto = 'TCP' THEN 1 ELSE 0 END) AS count_tcp > FROM test > WHERE tstamp >= now() - INTERVAL '$days DAYS' > AND dst_port = $port > GROUP BY gday > ORDER BY gday > Or, via a subquery: select distinct to_date(tstamp,'YYYY-MM-DD') as gday, ( select count(id) from test t1 where proto='UDP' and to_date(t1.tstamp,'YYYY-MM-DD') = to_date(test.tstamp,'YYYY-MM-DD') ) as count_udp, ( select count(id) from test t1 where proto='TCP' and to_date(t1.tstamp,'YYYY-MM-DD') = to_date(test.tstamp,'YYYY-MM-DD') ) as count_tcp from test where tstamp >= (now() - interval '6 days' ) and dst_port = 2290 order by gday; Harald's solution is better for your particular case and will almost certainly be faster, but subqueries are good to know how to do. :) > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org -- Daryl Richter Platform Author & Director of Technology (( Brandywine Asset Management ) ( "Expanding the Science of Global Investing" ) ( http://www.brandywine.com ))
Thank you, and sorry for the late answer, I was far away from a decent internet connection... I'll try both your solutions, EXPLAIN ANALYSE will elect the winner... In any case that will be cleaner than my dirty hack (2 distinct queries) which generate a lot of garbage... Thanks again, MaXX Daryl Richter wrote: > Harald Fuchs wrote: >> Try something like that: >> SELECT to_date (tstamp,'YYYY-MM-DD') AS gday, >> sum(CASE WHEN proto = 'UDP' THEN 1 ELSE 0 END) AS count_udp, >> sum(CASE WHEN proto = 'TCP' THEN 1 ELSE 0 END) AS count_tcp >> FROM test >> WHERE tstamp >= now() - INTERVAL '$days DAYS' >> AND dst_port = $port >> GROUP BY gday >> ORDER BY gday > Or, via a subquery: > select distinct to_date(tstamp,'YYYY-MM-DD') as gday, > ( select count(id) from test t1 where proto='UDP' and > to_date(t1.tstamp,'YYYY-MM-DD') = to_date(test.tstamp,'YYYY-MM-DD') ) as > count_udp, > ( select count(id) from test t1 where proto='TCP' and > to_date(t1.tstamp,'YYYY-MM-DD') = to_date(test.tstamp,'YYYY-MM-DD') ) as > count_tcp > from test > where tstamp >= (now() - interval '6 days' ) > and dst_port = 2290 > order by gday; > > Harald's solution is better for your particular case and will almost > certainly be faster, but subqueries are good to know how to do. :) -- MaXX