Re: Problems pushing down WHERE-clause to underlying view - Mailing list pgsql-general
From | Adrian Klaver |
---|---|
Subject | Re: Problems pushing down WHERE-clause to underlying view |
Date | |
Msg-id | 473540ab-5933-04e8-ce55-074f4faaab82@aklaver.com Whole thread Raw |
In response to | Re: Problems pushing down WHERE-clause to underlying view (Nicklas Avén <nicklas.aven@jordogskog.no>) |
Responses |
Re: Problems pushing down WHERE-clause to underlying view
|
List | pgsql-general |
On 2/15/19 9:27 AM, Nicklas Avén wrote: > > On 2/15/19 5:06 PM, Adrian Klaver wrote: > > On 2/15/19 7:28 AM, Nicklas Avén wrote: > >> Hi > >> > >> The problem is that it always calculates all those 22000 rows even > if the user id I use only gives 250 rows. > >> > >> So, the query uses 4 seconds instead of under 100 ms. > > > > https://www.postgresql.org/docs/10/sql-createview.html > > > > "CREATE VIEW defines a view of a query. The view is not physically > materialized. Instead, the query is run every time the view is > referenced in a query." > > > > > > > Sorry, I must have expressed what I mean bad. Of course a view is not > materialized. > I will explain without views what I mean here below > > > > > Might want to look at materialized view: > > https://www.postgresql.org/docs/10/sql-creatematerializedview.html > > > > "CREATE MATERIALIZED VIEW is similar to CREATE TABLE AS, except that > it also remembers the query used to initialize the view, so that it can > be refreshed later upon demand. A materialized view has many of the same > properties as a table, but there is no support for temporary > materialized views or automatic generation of OIDs." > > > > > > No, materialized views is not an option. We get some data into those > tables daily. Recalculating the full dataset on a lot of views like this > doesn't make sense. > Instead we have tables maintained with processed new data. But I want > this last part of logic on top as views for flexibility, > to not need cached tables for each possible type of grouping that we need. > Started out with materialized views and it didn't work out well. > > > > > > > > I would also suggest running the EXPLAIN below with ANALYZE so actual > timings are returned. Also try: > > > > SELECT > > * > > FROM > > underlying_view AS b > > JOIN > > contractor_access AS b > > ON > > a.machine_key = b.machine_key > > WHERE > > user_id = 'name@email.address' > >> > > Sorry again, I didn't mention. This I have tried this since this is what > the top level view do. > So first step when trying to understand this was (of course) to apply > the where-clause directly to the query > > > > So, let's do that also on the underlying query (view) . > > > Here I have 2 queries, where I apply the where clause directly to the > query in the underlying view > > (joining the contractor_access table directly on that query). > > The first takes 30-40 ms ms and returns the same 250 rows as the second. > In the first I use the machine_key in the where clause. > In the second query that takes about 16 seconds to return the same 250 > rows I use the user_id in the contractor_access table. > > I have also cleaned up the contractor_access table. So there is only 1 > row now, with my email as user_id and the same machine_key as used in > the first query. > I have not had chance to fully go through all of below. Some questions/suggestions: 1) Thanks for the formatted queries. If I could make a suggestion, when aliasing could you include AS. It would make finding what l.* refers to easier for those of us with old eyes:) 2) t4e_contractor_id is in the shiny_adm.contractor_access table? If not where? 3) What is the schema for shiny_adm.contractor_access? In particular what indexes are on it? > > > Query 1: > > > EXPLAIN ANALYZE > SELECT > l.machine_key, > o.object_name, > o.sub_object_name, > o.object_user_id, > o.sub_object_user_id, > o.start_date AS object_start_date, > s.species_group_name, > p.product_group_name, > l.m3_sub AS volume_m3sub, > l.number_of_logs, > mi.basemachine_manufacturer, > mi.basemachine_model > FROM ( SELECT hl.contractor_id, > hl.machine_key, > hl.operator_key, > hl.object_key, > hl.sub_object_key, > date(hl.harvest_date) AS harvest_date, > hl.species_group_key, > hl.product_key, > sum(hl.m3_sub) AS m3_sub, > count(*) AS number_of_logs > FROM version_union_tables_r02.harvester_logs hl > GROUP BY hl.machine_key, hl.contractor_id, hl.operator_key, > hl.object_key, hl.sub_object_key, (date(hl.harvest_date)), > hl.species_group_key, hl.product_key) l > LEFT JOIN version_union_tables_r02.machine_info mi ON > l.machine_key::text = mi.machine_key::text > LEFT JOIN version_union_tables_r02.objects o ON > l.machine_key::text = o.machine_key::text AND l.object_key = > o.object_key AND l.sub_object_key = o.sub_object_key > LEFT JOIN version_union_tables_r02.products p ON > l.machine_key::text = p.machine_key::text AND l.product_key = p.product_key > LEFT JOIN version_union_tables_r02.species s ON > l.machine_key::text = s.machine_key::text AND l.species_group_key = > s.species_group_key > join shiny_adm.contractor_access ci on l.machine_key=ci.machine_key > where l.machine_key = '887655635442600' > ; > > which results in this query plan > > Nested Loop (cost=61865.25..65302.20 rows=22624 width=122) (actual > time=25.804..27.134 rows=250 loops=1) > -> Seq Scan on contractor_access ci (cost=0.00..1.01 rows=1 > width=19) (actual time=0.009..0.010 rows=1 loops=1) > Filter: (machine_key = '887655635442600'::text) > -> Hash Left Join (cost=61865.25..65074.95 rows=22624 width=122) > (actual time=25.793..26.959 rows=250 loops=1) > Hash Cond: (((hl.machine_key)::text = (s.machine_key)::text) > AND (hl.species_group_key = s.species_group_key)) > -> Hash Left Join (cost=61854.55..64263.92 rows=22624 > width=120) (actual time=25.755..26.763 rows=250 loops=1) > Hash Cond: (((hl.machine_key)::text = > (p.machine_key)::text) AND (hl.product_key = p.product_key)) > -> Hash Left Join (cost=61815.97..63145.14 rows=22624 > width=118) (actual time=25.706..26.543 rows=250 loops=1) > Hash Cond: (((hl.machine_key)::text = > (o.machine_key)::text) AND (hl.object_key = o.object_key) AND > (hl.sub_object_key = o.sub_object_key)) > -> Hash Left Join (cost=61799.78..62619.90 > rows=22624 width=65) (actual time=25.668..26.327 rows=250 loops=1) > Hash Cond: ((hl.machine_key)::text = > (mi.machine_key)::text) > -> HashAggregate (cost=61796.99..62079.79 > rows=22624 width=69) (actual time=25.627..26.132 rows=250 loops=1) > Group Key: hl.machine_key, > hl.contractor_id, hl.operator_key, hl.object_key, hl.sub_object_key, > date(hl.harvest_date), hl.species_group_key, hl.product_key > -> Bitmap Heap Scan on harvester_logs > hl (cost=570.14..61224.14 rows=22914 width=61) (actual > time=0.909..11.573 rows=24151 loops=1) > Recheck Cond: > ((machine_key)::text = '887655635442600'::text) > Heap Blocks: exact=538 > -> Bitmap Index Scan on > version_union_tables_r02_harvester_logs_machine_key (cost=0.00..564.41 > rows=22914 width=0) (actual time=0.870..0.870 rows=24151 loops=1) > Index Cond: > ((machine_key)::text = '887655635442600'::text) > -> Hash (cost=2.77..2.77 rows=1 width=38) > (actual time=0.023..0.023 rows=1 loops=1) > Buckets: 1024 Batches: 1 Memory > Usage: 9kB > -> Seq Scan on machine_info mi > (cost=0.00..2.77 rows=1 width=38) (actual time=0.018..0.019 rows=1 loops=1) > Filter: ((machine_key)::text = > '887655635442600'::text) > Rows Removed by Filter: 61 > -> Hash (cost=16.12..16.12 rows=4 width=84) > (actual time=0.025..0.026 rows=3 loops=1) > Buckets: 1024 Batches: 1 Memory Usage: 9kB > -> Bitmap Heap Scan on objects o > (cost=4.31..16.12 rows=4 width=84) (actual time=0.020..0.021 rows=3 > loops=1) > Recheck Cond: ((machine_key)::text = > '887655635442600'::text) > Heap Blocks: exact=1 > -> Bitmap Index Scan on > version_union_tables_r02_objects_machine_key (cost=0.00..4.31 rows=4 > width=0) (actual time=0.015..0.015 rows=3 loops=1) > Index Cond: ((machine_key)::text > = '887655635442600'::text) > -> Hash (cost=38.19..38.19 rows=26 width=35) (actual > time=0.037..0.037 rows=26 loops=1) > Buckets: 1024 Batches: 1 Memory Usage: 10kB > -> Bitmap Heap Scan on products p > (cost=4.48..38.19 rows=26 width=35) (actual time=0.015..0.026 rows=26 > loops=1) > Recheck Cond: ((machine_key)::text = > '887655635442600'::text) > Heap Blocks: exact=1 > -> Bitmap Index Scan on > version_union_tables_r02_products_machine_key (cost=0.00..4.47 rows=26 > width=0) (actual time=0.009..0.009 rows=26 loops=1) > Index Cond: ((machine_key)::text = > '887655635442600'::text) > -> Hash (cost=10.52..10.52 rows=12 width=37) (actual > time=0.028..0.029 rows=12 loops=1) > Buckets: 1024 Batches: 1 Memory Usage: 9kB > -> Bitmap Heap Scan on species s (cost=4.37..10.52 > rows=12 width=37) (actual time=0.016..0.021 rows=12 loops=1) > Recheck Cond: ((machine_key)::text = > '887655635442600'::text) > Heap Blocks: exact=1 > -> Bitmap Index Scan on > version_union_tables_r02_species_machine_key (cost=0.00..4.36 rows=12 > width=0) (actual time=0.008..0.008 rows=12 loops=1) > Index Cond: ((machine_key)::text = > '887655635442600'::text) > Planning time: 0.434 ms > Execution time: 27.370 ms > > > > Next query, the slow one that calculates the whole dataset: > > > EXPLAIN ANALYZE > SELECT > l.machine_key, > o.object_name, > o.sub_object_name, > o.object_user_id, > o.sub_object_user_id, > o.start_date AS object_start_date, > s.species_group_name, > p.product_group_name, > l.m3_sub AS volume_m3sub, > l.number_of_logs, > mi.basemachine_manufacturer, > mi.basemachine_model > FROM ( SELECT hl.contractor_id, > hl.machine_key, > hl.operator_key, > hl.object_key, > hl.sub_object_key, > date(hl.harvest_date) AS harvest_date, > hl.species_group_key, > hl.product_key, > sum(hl.m3_sub) AS m3_sub, > count(*) AS number_of_logs > FROM version_union_tables_r02.harvester_logs hl > GROUP BY hl.machine_key, hl.contractor_id, hl.operator_key, > hl.object_key, hl.sub_object_key, (date(hl.harvest_date)), > hl.species_group_key, hl.product_key) l > LEFT JOIN version_union_tables_r02.machine_info mi ON > l.machine_key::text = mi.machine_key::text > LEFT JOIN version_union_tables_r02.objects o ON > l.machine_key::text = o.machine_key::text AND l.object_key = > o.object_key AND l.sub_object_key = o.sub_object_key > LEFT JOIN version_union_tables_r02.products p ON > l.machine_key::text = p.machine_key::text AND l.product_key = p.product_key > LEFT JOIN version_union_tables_r02.species s ON > l.machine_key::text = s.machine_key::text AND l.species_group_key = > s.species_group_key > join shiny_adm.contractor_access ci on l.machine_key=ci.machine_key > where t4e_contractor_id = 'nicklas.aven@jordogskog.no'; > > > results in this query plan: > > Hash Left Join (cost=1780026.09..2023556.15 rows=4044 width=122) > (actual time=15860.900..15888.766 rows=250 loops=1) > Hash Cond: (((hl.machine_key)::text = (o.machine_key)::text) AND > (hl.object_key = o.object_key) AND (hl.sub_object_key = o.sub_object_key)) > -> Merge Left Join (cost=1779946.65..2023340.22 rows=4044 width=69) > (actual time=15859.604..15887.287 rows=250 loops=1) > Merge Cond: ((hl.machine_key)::text = (s.machine_key)::text) > Join Filter: (hl.species_group_key = s.species_group_key) > Rows Removed by Join Filter: 2750 > -> Merge Left Join (cost=1779915.71..2023136.40 rows=4044 > width=67) (actual time=15859.072..15884.912 rows=250 loops=1) > Merge Cond: ((hl.machine_key)::text = (p.machine_key)::text) > Join Filter: (hl.product_key = p.product_key) > Rows Removed by Join Filter: 6250 > -> Merge Left Join (cost=1779788.20..2022471.20 > rows=4044 width=65) (actual time=15857.473..15879.504 rows=250 loops=1) > Merge Cond: ((hl.machine_key)::text = > (mi.machine_key)::text) > -> Merge Join (cost=1779783.74..2022437.81 > rows=4044 width=48) (actual time=15857.359..15879.102 rows=250 loops=1) > Merge Cond: ((hl.machine_key)::text = > ci.machine_key) > -> GroupAggregate > (cost=1779782.72..2012287.44 rows=808712 width=69) (actual > time=15088.353..15878.172 rows=2683 loops=1) > Group Key: hl.machine_key, > hl.contractor_id, hl.operator_key, hl.object_key, hl.sub_object_key, > (date(hl.harvest_date)), hl.species_group_key, hl.product_key > -> Sort (cost=1779782.72..1800000.52 > rows=8087121 width=61) (actual time=15088.336..15488.144 rows=942552 > loops=1) > Sort Key: hl.machine_key, > hl.contractor_id, hl.operator_key, hl.object_key, hl.sub_object_key, > (date(hl.harvest_date)), hl.species_group_key, hl.product_key > Sort Method: external merge Disk: > 543456kB > -> Seq Scan on harvester_logs > hl (cost=0.00..243781.01 rows=8087121 width=61) (actual > time=0.007..3169.984 rows=8084464 loops=1) > -> Sort (cost=1.02..1.03 rows=1 width=19) > (actual time=0.019..0.020 rows=1 loops=1) > Sort Key: ci.machine_key > Sort Method: quicksort Memory: 25kB > -> Seq Scan on contractor_access ci > (cost=0.00..1.01 rows=1 width=19) (actual time=0.008..0.009 rows=1 loops=1) > Filter: (t4e_contractor_id = > 'nicklas.aven@jordogskog.no'::text) > -> Sort (cost=4.47..4.62 rows=62 width=38) > (actual time=0.102..0.173 rows=266 loops=1) > Sort Key: mi.machine_key > Sort Method: quicksort Memory: 30kB > -> Seq Scan on machine_info mi > (cost=0.00..2.62 rows=62 width=38) (actual time=0.008..0.030 rows=62 > loops=1) > -> Sort (cost=127.50..131.23 rows=1491 width=35) > (actual time=1.205..3.071 rows=7204 loops=1) > Sort Key: p.machine_key > Sort Method: quicksort Memory: 175kB > -> Seq Scan on products p (cost=0.00..48.91 > rows=1491 width=35) (actual time=0.004..0.497 rows=1491 loops=1) > -> Sort (cost=30.94..32.09 rows=460 width=37) (actual > time=0.385..1.233 rows=3259 loops=1) > Sort Key: s.machine_key > Sort Method: quicksort Memory: 65kB > -> Seq Scan on species s (cost=0.00..10.60 rows=460 > width=37) (actual time=0.004..0.146 rows=460 loops=1) > -> Hash (cost=49.25..49.25 rows=1725 width=84) (actual > time=1.286..1.287 rows=1690 loops=1) > Buckets: 2048 Batches: 1 Memory Usage: 190kB > -> Seq Scan on objects o (cost=0.00..49.25 rows=1725 > width=84) (actual time=0.004..0.600 rows=1725 loops=1) > Planning time: 0.527 ms > Execution time: 15945.641 ms > > > > > Thanks > > Nicklas > -- Adrian Klaver adrian.klaver@aklaver.com
pgsql-general by date: