Re: grouping subsets - Mailing list pgsql-sql
From | Oliveiros d'Azevedo Cristina |
---|---|
Subject | Re: grouping subsets |
Date | |
Msg-id | CBEBC41CEAD8438789E3B491787C49A1@marktestcr.marktest.pt Whole thread Raw |
In response to | grouping subsets (Rainer Stengele <rainer.stengele@diplan.de>) |
Responses |
Re: grouping subsets
|
List | pgsql-sql |
Fine. Please advice me, How long can your table be? Thousands? Millions of records? Do you really need it in pure SQL ? It seems to me that it might be possible, I'm just affraid that the query would become too complex and thus slow... Best, Oliveiros ----- Original Message ----- From: "Rainer Stengele" <rainer.stengele@diplan.de> To: "Oliveiros d'Azevedo Cristina" <oliveiros.cristina@marktest.pt> Cc: <pgsql-sql@postgresql.org> Sent: Thursday, July 29, 2010 1:10 PM Subject: Re: grouping subsets > No. This is by accident. > We have to assume that the combinations do change anytime, and many times > per day. > > So > > "Or is it possible to have the same combination on one day with several > sets?" > > YES! > > Rainer > > > > Am 29.07.2010 13:47, schrieb Oliveiros d'Azevedo Cristina: >> Yes. This is somewhat more complicated because it has more constraints. >> I've noticed that a given combination doesn't appear with holes on a >> certain day. >> >> For ex, on a daily basis, we have every three key combinations together. >> >> We dont have things like >> 2010-7-01 1726 3212 1428 >> 2010-7-01 1726 3212 1428 >> ... 318 1846 1012 >> 2010-7-01 1726 3212 1428 >> >> Can I assume that, for a certain day , the records for the same three >> combination are all together? There is just one set per day for a given >> combination? >> >> Or is it possible to have the same combination on one day with several >> sets? >> >> Best, >> Oliveiros >> >> >> ----- Original Message ----- From: "Rainer Stengele" >> <rainer.stengele@diplan.de> >> Newsgroups: gmane.comp.db.postgresql.sql >> To: "Oliveiros d'Azevedo Cristina" <oliveiros.cristina@marktest.pt> >> Cc: <> >> Sent: Thursday, July 29, 2010 10:41 AM >> Subject: Re: grouping subsets >> >> >>> Howdy Cristina, >>> >>> unfortunately things are more complicated. I have inserted an excerpt of >>> the real data here: >>> >>> ================================================================================ >>> TableID MasterID dtBegin dtEnd idR >>> idL idB consumption >>> 4057312 295530 2010-07-01 00:59:21.077 2010-07-01 >>> 01:32:59.670 1726 3212 1428 279 >>> 4061043 295574 2010-07-01 01:59:31.137 2010-07-01 >>> 02:32:09.373 1726 3212 1428 183 >>> 4083397 295838 2010-07-01 07:57:51.327 2010-07-01 >>> 08:28:28.117 318 1846 1012 30 >>> 4090858 295920 2010-07-01 09:52:33.777 2010-07-01 >>> 10:31:34.393 318 1846 1012 487 >>> 4094589 295961 2010-07-01 10:47:59.370 2010-07-01 >>> 11:32:20.903 318 1846 1012 472 >>> 4098330 296013 2010-07-01 11:58:53.890 2010-07-01 >>> 12:31:35.730 318 1846 1012 195 >>> 4102069 296058 2010-07-01 12:36:19.170 2010-07-01 >>> 13:32:13.950 318 1846 1012 338 >>> 4105809 296102 2010-07-01 13:58:53.170 2010-07-01 >>> 14:02:57.710 318 1846 1012 105 >>> 4109555 296150 2010-07-01 14:59:11.663 2010-07-01 >>> 15:32:33.810 318 1846 1012 187 >>> 4113305 296194 2010-07-01 15:59:01.797 2010-07-01 >>> 16:02:27.260 318 1846 1012 108 >>> 4117048 296238 2010-07-01 16:20:47.997 2010-07-01 >>> 17:32:49.367 318 1846 1012 179 >>> 4120791 296282 2010-07-01 17:58:27.657 2010-07-01 >>> 18:29:01.733 318 1846 1012 256 >>> 4128291 296370 2010-07-01 19:54:17.687 2010-07-01 >>> 20:32:53.850 318 1846 1012 239 >>> 4132044 296413 2010-07-01 20:31:37.653 2010-07-01 >>> 21:29:13.497 318 1846 1012 39 >>> 4135797 296458 2010-07-01 21:59:13.983 2010-07-01 >>> 22:32:46.503 318 1846 1012 157 >>> 4139572 296506 2010-07-01 22:58:49.530 2010-07-01 >>> 23:32:22.543 318 1846 1012 218 >>> 4142941 296554 2010-07-01 23:59:13.857 2010-07-02 >>> 00:32:30.390 318 1846 1012 248 >>> 4146289 296598 2010-07-02 00:58:55.763 2010-07-02 >>> 01:32:41.983 318 1846 1012 204 >>> 4149616 296642 2010-07-02 01:46:57.357 2010-07-02 >>> 02:32:56.983 318 1846 1012 42 >>> 4152952 296686 2010-07-02 02:55:19.653 2010-07-02 >>> 03:32:28.013 318 1846 1012 135 >>> 4156289 296730 2010-07-02 03:43:52.777 2010-07-02 >>> 04:32:55.250 318 1846 1012 743 >>> 4159624 296774 2010-07-02 04:43:15.310 2010-07-02 >>> 05:32:44.547 318 1846 1012 277 >>> 4162961 296817 2010-07-02 05:58:59.483 2010-07-02 >>> 06:32:37.340 318 1846 1012 121 >>> 4166303 296862 2010-07-02 06:58:50.733 2010-07-02 >>> 07:32:39.113 318 1846 1012 239 >>> 4172981 296950 2010-07-02 07:28:55.293 2010-07-02 >>> 09:33:01.200 318 1846 1012 512 >>> 4176322 296993 2010-07-02 09:59:04.607 2010-07-02 >>> 10:33:01.903 318 1846 1012 139 >>> 4179667 297038 2010-07-02 10:55:27.760 2010-07-02 >>> 11:32:56.560 318 1846 1012 722 >>> 4183012 297082 2010-07-02 11:59:33.650 2010-07-02 >>> 12:32:14.700 318 1846 1012 163 >>> 4186351 297126 2010-07-02 12:23:45.997 2010-07-02 >>> 13:32:59.500 318 1846 1012 284 >>> 4189689 297169 2010-07-02 13:44:21.253 2010-07-02 >>> 14:18:05.080 318 1846 1012 254 >>> 4196371 297258 2010-07-02 16:16:19.123 2010-07-02 >>> 16:32:53.437 1706 3541 1511 161 >>> 4199720 297301 2010-07-02 16:59:35.127 2010-07-02 >>> 17:32:57.950 1706 3541 1511 250 >>> 4203068 297346 2010-07-02 17:59:34.027 2010-07-02 >>> 18:32:54.337 1706 3541 1511 302 >>> 4206413 297389 2010-07-02 18:59:28.730 2010-07-02 >>> 19:32:37.950 1706 3541 1511 276 >>> 4209758 297434 2010-07-02 19:54:00.243 2010-07-02 >>> 20:32:57.433 1706 3541 1511 209 >>> 4213102 297473 2010-07-02 20:49:10.963 2010-07-02 >>> 21:30:44.540 1706 3541 1511 76 >>> 4216447 297511 2010-07-02 21:59:34.810 2010-07-02 >>> 22:33:00.603 1706 3541 1511 287 >>> 4219818 297569 2010-07-02 22:56:52.750 2010-07-02 >>> 23:59:31.607 1706 3541 1511 1877 >>> 4219819 297570 2010-07-02 23:59:21.577 2010-07-03 >>> 00:54:40.153 1706 3541 1511 1798 >>> 4219821 297572 2010-07-03 00:48:03.310 2010-07-03 >>> 01:59:37.920 1706 3541 1511 1125 >>> 4219823 297574 2010-07-03 01:51:01.057 2010-07-03 >>> 02:59:45.433 1706 3541 1511 1629 >>> 4219820 297571 2010-07-03 02:59:29.393 2010-07-03 >>> 03:59:54.920 1706 3541 1511 2462 >>> 4219822 297573 2010-07-03 03:59:18.663 2010-07-03 >>> 04:01:48.810 1706 3541 1511 70 >>> 4225738 297656 2010-07-03 06:13:34.980 2010-07-03 >>> 06:28:09.697 1726 3212 1428 46 >>> 4228694 297695 2010-07-03 06:59:15.560 2010-07-03 >>> 07:32:45.653 1726 3212 1428 251 >>> 4231649 297733 2010-07-03 07:59:11.937 2010-07-03 >>> 08:32:57.217 1726 3212 1428 284 >>> 4234604 297771 2010-07-03 08:57:00.357 2010-07-03 >>> 09:32:47.903 1726 3212 1428 227 >>> 4237559 297809 2010-07-03 09:59:19.813 2010-07-03 >>> 10:33:02.063 1726 3212 1428 285 >>> 4261156 298596 2010-07-04 22:59:09.863 2010-07-04 >>> 23:33:45.530 1726 3212 1428 1286 >>> 4264114 298646 2010-07-04 23:59:16.967 2010-07-05 >>> 00:33:08.107 1726 3212 1428 297 >>> 4267067 298690 2010-07-05 00:59:15.187 2010-07-05 >>> 01:32:48.300 1726 3212 1428 333 >>> 4270023 298734 2010-07-05 01:59:02.497 2010-07-05 >>> 02:32:48.780 1726 3212 1428 270 >>> 4272977 298778 2010-07-05 02:41:43.737 2010-07-05 >>> 03:32:56.043 1726 3212 1428 317 >>> 4275927 298822 2010-07-05 03:59:17.027 2010-07-05 >>> 04:33:14.947 1726 3212 1428 1623 >>> ================================================================================ >>> >>> Description: >>> 1. Column: some ID >>> 2. Column: reference to another table >>> 3. and 4. column: timestamp from/to of the item >>> 5. Column: ID R >>> 6. Column: ID L >>> 7. Column: ID B >>> 8. Column: Sum of components >>> >>> Requirement: >>> Sum over all components (from column 8) for each combination of ID R, ID >>> L, ID B, but (!) >>> rows with same keys (R,L,B) should be summed up only until the keys >>> change. >>> Do not sum up the components for identical keys, if there are other keys >>> between them. >>> >>> Example result: >>> >>> idR idL idB SUM >>> 1726 3212 1428 462 >>> 318 1846 1012 ... >>> 1706 3541 1511 ... >>> 1726 3212 1428 ... >>> >>> >>> Note that the first and last entry here has the same keys >>> >>> Maybe you find a similar monster SQL solving such a requirement. >>> Thanks for considering! >>> >>> Rainer >>> >>> >>> >>> Am 27.07.2010 12:37, schrieb Oliveiros d'Azevedo Cristina: >>>> Howdy, Rainer. >>>> >>>> It's been a while, so I don't know if you are still interested in this >>>> problem or if you, in the meantime, found yourself a solution, >>>> but I've tried this on a local copy of the example you provided and >>>> it seems to work. >>>> >>>> The problem is that I suspect that if you have several thousands of >>>> records on your table it will become slow... >>>> >>>> Best, >>>> Oliveiros >>>> >>>> SELECT SUM(tudo.parcela),tudo.a >>>> FROM >>>> ( >>>> SELECT fo.parcela,fo.a,fo.b,fo.c,MIN(th.c) as d >>>> FROM >>>> ( >>>> SELECT se.a as parcela,se.b as a,se.c as b, MAX(pr.c) as c >>>> FROM >>>> yourTable se >>>> LEFT JOIN >>>> ( >>>> SELECT a.* >>>> FROM yourTable a >>>> JOIN yourTable b >>>> ON (b.b <> a.b) >>>> AND ((age(a.c,b.c) = '1 day'::interval) >>>> >>>> ) >>>> ) pr >>>> ON pr.b = se.b >>>> AND se.c >= pr.c >>>> GROUP BY se.a,se.b,se.c >>>> ) fo >>>> LEFT JOIN >>>> ( >>>> SELECT a.* >>>> FROM yourTable a >>>> JOIN yourTable b >>>> ON (b.b <> a.b) >>>> AND ((age(a.c,b.c) = '-1 day'::interval) >>>> ) >>>> ) th >>>> ON fo.a = th.b >>>> AND fo.b <= th.c >>>> GROUP BY fo.parcela,fo.a,fo.b,fo.c >>>> ) tudo >>>> GROUP BY tudo.a,tudo.c,tudo.d >>>> >>>> >>>> >>>> >>>> To: <pgsql-sql@postgresql.org> >>>> Sent: Thursday, July 22, 2010 9:09 AM >>>> Subject: [SQL] grouping subsets >>>> >>>> >>>>> Hi, >>>>> >>>>> having a table similar to >>>>> >>>>> | 1 | B | [2010-07-15 Do] | >>>>> | 1 | B | [2010-07-16 Fr] | >>>>> |---+---+-----------------| >>>>> | 2 | C | [2010-07-17 Sa] | >>>>> | 2 | C | [2010-07-18 So] | >>>>> |---+---+-----------------| >>>>> | 1 | B | [2010-07-19 Mo] | >>>>> | 1 | B | [2010-07-20 Di] | >>>>> | 1 | B | [2010-07-21 Mi] | >>>>> | 1 | B | [2010-07-22 Do] | >>>>> |---+---+-----------------| >>>>> | 3 | D | [2010-07-23 Fr] | >>>>> >>>>> a simple group by gives me: >>>>> >>>>> | 6 | B | >>>>> | 4 | C | >>>>> | 3 | D | >>>>> >>>>> >>>>> What I want to get is the values grouped by "subset", where a subset >>>>> is a set of rows with identical column until the colum changes. >>>>> Is there a way to get >>>>> >>>>> | 2 | B | >>>>> | 4 | C | >>>>> | 4 | B | >>>>> | 3 | D | >>>>> >>>>> by SQL only? >>>>> >>>>> - Rainer >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) >>>>> To make changes to your subscription: >>>>> http://www.postgresql.org/mailpref/pgsql-sql >>>> >>>> >>