Thread: how to write an optimized sql with two same subsql?

how to write an optimized sql with two same subsql?

From

sunpeng

Date:

14 October 2010, 20:34:19

We have a table A:
CREATE TABLE A(
   uid integer,
   groupid integer
)
Now we use this subsql to get each group's count:
SELECT count(*) as count
FROM A
GROUP BY groupid
ORDER BY groupid

Then we try to find the group pair with following conditions:
SELECT c.groupid as groupid1,d.groupid as groupid2
FROM subsql as c, subsql as d
WHERE d.groupid > c.groupid
              and d.count > c.count;

Does that mean subsql will be executed twice? or how to write the optimized sql?

Re: how to write an optimized sql with two same subsql?

From

Rob Sargent

Date:

14 October 2010, 21:06:23


On 10/14/2010 05:34 PM, sunpeng wrote:
> We have a table A:
> CREATE TABLE A(
>    uid integer,
>    groupid integer
> )
> Now we use this subsql to get each group's count:
> SELECT count(*) as count
> FROM A
> GROUP BY groupid
> ORDER BY groupid
>
> Then we try to find the group pair with following conditions:
> SELECT c.groupid as groupid1,d.groupid as groupid2
> FROM subsql as c, subsql as d
> WHERE d.groupid > c.groupid
>               and d.count > c.count;
>
> Does that mean subsql will be executed twice? or how to write the
> optimized sql?
>
>
What are you trying to discover about groups and their frequency in
tablea?  Does the numberical value of groupid have any meaning in your
system?

Re: how to write an optimized sql with two same subsql?

From

sunpeng

Date:

14 October 2010, 21:18:26

Actually I've simplied my original sql to the previous version, since it's simple yet reveals the same problem.
My original sql is to get two instersected cluster(as same concept as group ) and its commonarea:

SELECT a.clusterid                                          AS clusterida,
       b.clusterid                                          AS clusteridb,
       St_astext(St_intersection(a.bufferbox, b.bufferbox)) AS commonarea
FROM   (SELECT St_buffer(St_convexhull(St_collect(c.a0)), 2100.000000) AS
               bufferbox,
               d.clusterid                                             AS
               clusterid
        FROM   _mcir_2347694 c,
               _mcir_2347694_clusterid2 d
        WHERE   c.uid = d.uid
        GROUP  BY d.clusterid) a,
       (SELECT St_buffer(St_convexhull(St_collect(c.a0)), 2100.000000) AS
               bufferbox,
               d.clusterid                                             AS
               clusterid
        FROM   _mcir_2347694 c,
               _mcir_2347694_clusterid2 d
        WHERE  c.uid = d.uid
        GROUP  BY d.clusterid) b
WHERE  b.clusterid > a.clusterid
       AND St_intersects(a.bufferbox, b.bufferbox)
ORDER  BY a.clusterid;

The DDL for _mcir_2347694 and _mcir_2347694_clusterid2 is:
CREATE TABLE _mcir_2347579
(
a0 geometry,
uid integer
)
CREATE TABLE _mcir_2347579_clusterid2
(
uid integer NOT NULL,
clusterid integer
)
In these two tables, _mcir_2347579_clusterid2.uid = mcir_2347579.uid, just like a forign key.

The same question is how to avoid the following subquery be executed twice:
SELECT St_buffer(St_convexhull(St_collect(c.a0)), 2100.000000) AS
               bufferbox,
               d.clusterid                                             AS
               clusterid
        FROM   _mcir_2347694 c,
               _mcir_2347694_clusterid2 d
        WHERE c.uid = d.uid
        GROUP  BY d.clusterid

2010/10/14 Rob Sargent <robjsargent@gmail.com>

On 10/14/2010 05:34 PM, sunpeng wrote:
> We have a table A:
> CREATE TABLE A(
> uid integer,
> groupid integer
> )
> Now we use this subsql to get each group's count:
> SELECT count(*) as count
> FROM A
> GROUP BY groupid
> ORDER BY groupid
>
> Then we try to find the group pair with following conditions:
> SELECT c.groupid as groupid1,d.groupid as groupid2
> FROM subsql as c, subsql as d
> WHERE d.groupid > c.groupid
> and d.count > c.count;
>
> Does that mean subsql will be executed twice? or how to write the
> optimized sql?
>
>
What are you trying to discover about groups and their frequency in
tablea? Does the numberical value of groupid have any meaning in your
system?

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: how to write an optimized sql with two same subsql?

From

"Igor Neyman"

Date:

15 October 2010, 14:56:34


> -----Original Message-----
> From: sunpeng [mailto:bluevaley@gmail.com]
> Sent: Thursday, October 14, 2010 7:34 PM
> To: pgsql-general@postgresql.org
> Subject: how to write an optimized sql with two same subsql?
>
> We have a table A:
> CREATE TABLE A(
>    uid integer,
>    groupid integer
> )
> Now we use this subsql to get each group's count:
> SELECT count(*) as count
> FROM A
> GROUP BY groupid
> ORDER BY groupid
>
> Then we try to find the group pair with following conditions:
> SELECT c.groupid as groupid1,d.groupid as groupid2 FROM
> subsql as c, subsql as d WHERE d.groupid > c.groupid
>               and d.count > c.count;
>
> Does that mean subsql will be executed twice? or how to write
> the optimized sql?
>

Is that what you want:

WITH gr_counts AS (
    SELECT groupid, COUNT(*) AS CNT
      FROM A
      GROUP BY groupid)
SELECT C.groupid AS groupid1, D.groupid AS groupid2
  FROM gr_counts C, gr_counts D
  WHERE D.groupid > C.groupid
    AND D.count > C.count;

This will execute:

SELECT groupid, COUNT(*) AS CNT
      FROM A
      GROUP BY groupid

only once.

Regards,
Igor Neyman