Thread: Optimization on JOIN

Optimization on JOIN

From
Yan Cheng Cheok
Date:
I create 1 lot.
every lot is having 10000 unit
every unit is having 100 measurement.

hence :

lot - 1 row entry
unit - 10000 row entries
measurement - 1000000 row entries

Currently, I am having JOIN statement as follow (1st case)

SELECT measurement_type.value, measurement.value, measurement_unit.value
    FROM
    measurement_type INNER JOIN
        (measurement_unit INNER JOIN
            (measurement INNER JOIN
                (lot INNER JOIN unit ON (lot_id = fk_lot_id))
            ON (fk_unit_id = unit_id))
        ON (fk_measurement_unit_id = measurement_unit_id))
    ON (fk_measurement_type_id = measurement_type_id) WHERE lot_id = 7;

I thought, I may optimized it using : (2nd case, Take note on the WHERE statement)


SELECT measurement_type.value, measurement.value, measurement_unit.value
    FROM
    measurement_type INNER JOIN
        (measurement_unit INNER JOIN
            (measurement INNER JOIN
                (lot INNER JOIN unit ON (lot_id = fk_lot_id) WHERE lot_id = 7)
            ON (fk_unit_id = unit_id))
        ON (fk_measurement_unit_id = measurement_unit_id))
    ON (fk_measurement_type_id = measurement_type_id);


My thought is as follow :

For 1st case, my visualization is :

(lot join unit)

lot_id  unit_id  -> 6 rows
===============
1        1
1        2
1        3
2     4
2     5
2     6


measurement join (lot join unit)

lot_id  unit_id     measurement_id   -> 18 rows
========================
1        1     1
1        1     2
1        1     3
1        2       4
1        2     5
1        2     6
1        3       7
1        3       8
1        3       9
2     4       10
2     4       11
2     4       12
2     5       13
2     5       14
2     5       15
2     6     16
2     6       17
2     6       18


measurement join (lot join unit) where lot_id = 1

lot_id  unit_id     measurement_id   -> 9 rows
========================
1        1     1
1        1     2
1        1     3
1        2       4
1        2     5
1        2     6
1        3       7
1        3       8
1        3       9



For 2nd case, my visualization is :

(lot join unit where lot_id = 1)

lot_id  unit_id  -> 3 rows
===============
1        1
1        2
1        3


measurement join (lot join unit where lot_id = 1)

lot_id  unit_id     measurement_id   -> 9 rows
========================
1        1     1
1        1     2
1        1     3
1        2       4
1        2     5
1        2     6
1        3       7
1        3       8
1        3       9


During the process, 2nd case only need maximum 9 rows, compare to 1st case 18 rows.

However, the 2nd case syntax is incorrect :(

ERROR:  syntax error at or near "WHERE"
LINE 6: ...     (lot INNER JOIN unit ON (lot_id = fk_lot_id) WHERE lot_...
                                                             ^

Is there any way I may first perform filter on the small table, then only I use the filtered result for sub-sequence
join?

Instead of I first join into a very large table, only I perform filtering (which I assume will be slower)

Thanks

Thanks and Regards
Yan Cheng CHEOK





Re: Optimization on JOIN

From
Alban Hertroys
Date:
On 22 Jan 2010, at 7:59, Yan Cheng Cheok wrote:

> SELECT measurement_type.value, measurement.value, measurement_unit.value
>    FROM
>    measurement_type INNER JOIN
>        (measurement_unit INNER JOIN
>            (measurement INNER JOIN
>                (lot INNER JOIN unit ON (lot_id = fk_lot_id) WHERE lot_id = 7)
>            ON (fk_unit_id = unit_id))
>        ON (fk_measurement_unit_id = measurement_unit_id))
>    ON (fk_measurement_type_id = measurement_type_id);


Yeah, you can't use WHERE in the middle of an expression. You can just add your extra constraint to the ON-clause here
though:ON (lot_id = fk_lot_id AND lot_id = 7) 

You don't need to nest all those joins btw, and it doesn't help to read your query. I turned it inside out into this to
makeit a bit more readable, I hope I got it right: 

SELECT measurement_type.value, measurement.value, measurement_unit.value
FROM lot INNER JOIN unit ON (lot_id = fk_lot_id)
INNER JOIN measurement ON (fk_unit_id = unit_id)
INNER JOIN measurement_unit ON (fk_measurement_unit_id = measurement_unit_id)
INNER JOIN measurement_type ON (fk_measurement_type_id = measurement_type_id)
WHERE lot_id = 7

Since you don't seem to need any data from lot in your results you can remove that table from your query like this:

SELECT measurement_type.value, measurement.value, measurement_unit.value
FROM measurement ON (fk_unit_id = unit_id)
INNER JOIN measurement_unit ON (fk_measurement_unit_id = measurement_unit_id)
INNER JOIN measurement_type ON (fk_measurement_type_id = measurement_type_id)
WHERE fk_lot_id = 7

Notice that we now use a different column in the WHERE clause, namely measurement.fk_lot_id instead of lot.lot_id.

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,4b5995db10606279919623!



Re: Optimization on JOIN

From
Sam Mason
Date:
On Thu, Jan 21, 2010 at 10:59:42PM -0800, Yan Cheng Cheok wrote:
> Currently, I am having JOIN statement as follow (1st case)
>
> SELECT measurement_type.value, measurement.value, measurement_unit.value
>     FROM
>     measurement_type INNER JOIN
>         (measurement_unit INNER JOIN
>             (measurement INNER JOIN
>                 (lot INNER JOIN unit ON (lot_id = fk_lot_id))
>             ON (fk_unit_id = unit_id))
>         ON (fk_measurement_unit_id = measurement_unit_id))
>     ON (fk_measurement_type_id = measurement_type_id) WHERE lot_id = 7;

As you're only using INNER JOINs and equality conditions it's not going
to matter where you put the WHERE clause.  PG can rewrite this any way
it wants and will put the constraints where ever the stats say it's best
placed.  Just write the query however it is easiest to read and trust PG
to do the rest.

The syntax you want is to put more in the ON cause though, i.e:

  SELECT *
  FROM foo f
    INNER JOIN bar b ON f.id = b.id AND b.other = 7

is the same as:

  SELECT *
  FROM foo f
    INNER JOIN bar b ON f.id = b.id
  WHERE b.other = 7;

is the same as:

  SELECT *
  FROM foo f, bar b
  WHERE f.id = b.id
    AND b.other = 7;

there are a few other ways of writing this as well.  All are the same
and PG is able to rewrite them all to each other depending on which ever
it thinks will be the most efficient.

Try EXPLAINing the queries to see how PG is interpreting your queries.

--
  Sam  http://samason.me.uk/