Thread: how to write an optimized sql with two same subsql?
We have a table A:
CREATE TABLE A(
uid integer,
groupid integer
)
Now we use this subsql to get each group's count:
SELECT count(*) as count
FROM A
GROUP BY groupid
ORDER BY groupid
Then we try to find the group pair with following conditions:
SELECT c.groupid as groupid1,d.groupid as groupid2
FROM subsql as c, subsql as d
WHERE d.groupid > c.groupid
and d.count > c.count;
Does that mean subsql will be executed twice? or how to write the optimized sql?
CREATE TABLE A(
uid integer,
groupid integer
)
Now we use this subsql to get each group's count:
SELECT count(*) as count
FROM A
GROUP BY groupid
ORDER BY groupid
Then we try to find the group pair with following conditions:
SELECT c.groupid as groupid1,d.groupid as groupid2
FROM subsql as c, subsql as d
WHERE d.groupid > c.groupid
and d.count > c.count;
Does that mean subsql will be executed twice? or how to write the optimized sql?
On 10/14/2010 05:34 PM, sunpeng wrote: > We have a table A: > CREATE TABLE A( > uid integer, > groupid integer > ) > Now we use this subsql to get each group's count: > SELECT count(*) as count > FROM A > GROUP BY groupid > ORDER BY groupid > > Then we try to find the group pair with following conditions: > SELECT c.groupid as groupid1,d.groupid as groupid2 > FROM subsql as c, subsql as d > WHERE d.groupid > c.groupid > and d.count > c.count; > > Does that mean subsql will be executed twice? or how to write the > optimized sql? > > What are you trying to discover about groups and their frequency in tablea? Does the numberical value of groupid have any meaning in your system?
Actually I've simplied my original sql to the previous version, since it's simple yet reveals the same problem.
My original sql is to get two instersected cluster(as same concept as group ) and its commonarea:
SELECT a.clusterid AS clusterida,
b.clusterid AS clusteridb,
St_astext(St_intersection(a.bufferbox, b.bufferbox)) AS commonarea
FROM (SELECT St_buffer(St_convexhull(St_collect(c.a0)), 2100.000000) AS
bufferbox,
d.clusterid AS
clusterid
FROM _mcir_2347694 c,
_mcir_2347694_clusterid2 d
WHERE c.uid = d.uid
GROUP BY d.clusterid) a,
(SELECT St_buffer(St_convexhull(St_collect(c.a0)), 2100.000000) AS
bufferbox,
d.clusterid AS
clusterid
FROM _mcir_2347694 c,
_mcir_2347694_clusterid2 d
WHERE c.uid = d.uid
GROUP BY d.clusterid) b
WHERE b.clusterid > a.clusterid
AND St_intersects(a.bufferbox, b.bufferbox)
ORDER BY a.clusterid;
The DDL for _mcir_2347694 and _mcir_2347694_clusterid2 is:
CREATE TABLE _mcir_2347579
(
a0 geometry,
uid integer
)
CREATE TABLE _mcir_2347579_clusterid2
(
uid integer NOT NULL,
clusterid integer
)
In these two tables, _mcir_2347579_clusterid2.uid = mcir_2347579.uid, just like a forign key.
The same question is how to avoid the following subquery be executed twice:
SELECT St_buffer(St_convexhull(St_collect(c.a0)), 2100.000000) AS
bufferbox,
d.clusterid AS
clusterid
FROM _mcir_2347694 c,
_mcir_2347694_clusterid2 d
WHERE c.uid = d.uid
GROUP BY d.clusterid
My original sql is to get two instersected cluster(as same concept as group ) and its commonarea:
SELECT a.clusterid AS clusterida,
b.clusterid AS clusteridb,
St_astext(St_intersection(a.bufferbox, b.bufferbox)) AS commonarea
FROM (SELECT St_buffer(St_convexhull(St_collect(c.a0)), 2100.000000) AS
bufferbox,
d.clusterid AS
clusterid
FROM _mcir_2347694 c,
_mcir_2347694_clusterid2 d
WHERE c.uid = d.uid
GROUP BY d.clusterid) a,
(SELECT St_buffer(St_convexhull(St_collect(c.a0)), 2100.000000) AS
bufferbox,
d.clusterid AS
clusterid
FROM _mcir_2347694 c,
_mcir_2347694_clusterid2 d
WHERE c.uid = d.uid
GROUP BY d.clusterid) b
WHERE b.clusterid > a.clusterid
AND St_intersects(a.bufferbox, b.bufferbox)
ORDER BY a.clusterid;
The DDL for _mcir_2347694 and _mcir_2347694_clusterid2 is:
CREATE TABLE _mcir_2347579
(
a0 geometry,
uid integer
)
CREATE TABLE _mcir_2347579_clusterid2
(
uid integer NOT NULL,
clusterid integer
)
In these two tables, _mcir_2347579_clusterid2.uid = mcir_2347579.uid, just like a forign key.
The same question is how to avoid the following subquery be executed twice:
SELECT St_buffer(St_convexhull(St_collect(c.a0)), 2100.000000) AS
bufferbox,
d.clusterid AS
clusterid
FROM _mcir_2347694 c,
_mcir_2347694_clusterid2 d
WHERE c.uid = d.uid
GROUP BY d.clusterid
2010/10/14 Rob Sargent <robjsargent@gmail.com>
What are you trying to discover about groups and their frequency in
On 10/14/2010 05:34 PM, sunpeng wrote:
> We have a table A:
> CREATE TABLE A(
> uid integer,
> groupid integer
> )
> Now we use this subsql to get each group's count:
> SELECT count(*) as count
> FROM A
> GROUP BY groupid
> ORDER BY groupid
>
> Then we try to find the group pair with following conditions:
> SELECT c.groupid as groupid1,d.groupid as groupid2
> FROM subsql as c, subsql as d
> WHERE d.groupid > c.groupid
> and d.count > c.count;
>
> Does that mean subsql will be executed twice? or how to write the
> optimized sql?
>
>
tablea? Does the numberical value of groupid have any meaning in your
system?
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
> -----Original Message----- > From: sunpeng [mailto:bluevaley@gmail.com] > Sent: Thursday, October 14, 2010 7:34 PM > To: pgsql-general@postgresql.org > Subject: how to write an optimized sql with two same subsql? > > We have a table A: > CREATE TABLE A( > uid integer, > groupid integer > ) > Now we use this subsql to get each group's count: > SELECT count(*) as count > FROM A > GROUP BY groupid > ORDER BY groupid > > Then we try to find the group pair with following conditions: > SELECT c.groupid as groupid1,d.groupid as groupid2 FROM > subsql as c, subsql as d WHERE d.groupid > c.groupid > and d.count > c.count; > > Does that mean subsql will be executed twice? or how to write > the optimized sql? > Is that what you want: WITH gr_counts AS ( SELECT groupid, COUNT(*) AS CNT FROM A GROUP BY groupid) SELECT C.groupid AS groupid1, D.groupid AS groupid2 FROM gr_counts C, gr_counts D WHERE D.groupid > C.groupid AND D.count > C.count; This will execute: SELECT groupid, COUNT(*) AS CNT FROM A GROUP BY groupid only once. Regards, Igor Neyman