Re: Expression Pruning in postgress - Mailing list pgsql-hackers
From | HarmeekSingh Bedi |
---|---|
Subject | Re: Expression Pruning in postgress |
Date | |
Msg-id | CALLwk6vj0V959K_F2QQM=TsL7cGw3=tYh-JE7p4eyzDtu56GJg@mail.gmail.com Whole thread Raw |
In response to | Re: Expression Pruning in postgress (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Expression Pruning in postgress
|
List | pgsql-hackers |
<br />Thanks Tom. Here is a example. Just a background of things . I have made changes in postgress execution and storageengine to make it a MPP style engine - keeping all optimizer intact. Basically take pgress serial plan and constructa parallel plan. The query I am running is below.<br /><br /><u><b>Query</b></u><br /><br /># explain select ooj.local_time,ooj.is_timeout,ooj.capsulename,<br/> regexp_split_to_table(<br /> case<br /> when ooj.isubstr is null then 'none'<br /> when ooj.isubstr='' then 'none'<br /> else ooj.isubstr<br/> end ,';') as interest,<br /> sum(ooj.impression) count_impressions,<br /> sum(ooj.click)count_clicks<br />from (<br /> select (impression.server_utc_time/3600000)*3600 as local_time,<br /> impression.is_timeout_default_impression as is_timeout,<br /> impression.capsule_name as capsulename,<br/> substring(impression.website_variables from 'ybt=([0-9;]*)') as isubstr,<br /> 1as impression,<br /> case click.impression_id when null then 0 else 1 end as click<br /> from<br /> impression_ytw2_row impression<br /> left outer join clicks_row click<br /> on impression.impression_id = click.impression_id)ooj<br /> group by local_time, is_timeout, capsulename, interest;<br /><br /><u><b>Now if you kindlybear with me and ignore the Parallel nodes in the plan {which are just parallel distributors} . If you compare thetwo plans below and check the output tupledesc of the Hash join you will see that some columns are not gettign prunedout - In my case this is a big column. <br /></b></u><br /><b>Case 1 Plan {inline view gets merged} </b><br /><br /> Now when the inline view gets merged via pullup_subqueries I can see that we have columns {impression.website_variables}which should get pruned out but it does not after the JOIN { it gets pruned out after hashaggregate). <br /><br /> Parallel reciever RANDOM queue (nodenum=0 cost=132.79..133.12 rows=10 width=104)<br /> Output:(sum(1)), impression.is_timeout_default_impression, impression.capsule_name, (sum(1)), sum(1), sum(1)<br /> -> Parallel sender RANDOM queue (nodenum=1 cost=132.79..133.12 rows=10 width=104)<br /> Output: (sum(1)), impression.is_timeout_default_impression,impression.capsule_name, (sum(1)), sum(1), sum(1)<br /> -> HashAggregate (nodenum=2 cost=132.79..133.12 rows=10 width=104)<br /> Output: (((impression.server_utc_time/ 3600000) * 3600)), impression.is_timeout_default_impression, impression.capsule_name, (regexp_split_to_table(CASEWHEN ("substring"((impression.website_variables)::text, 'ybt=([0-9;]*)'::text) IS NULL<br /> THEN'none'::text WHEN ("substring"((impression.website_variables)::text, 'ybt=([0-9;]*)'::text) = ''::text) THEN 'none'::textELSE "substring"((impression.website_variables)::text, 'ybt=([0-9;]*)'::text) END, ';'::text)), sum(1), sum(1)<br/> -> Parallel reciever HASH-AGG queue (nodenum=3 cost=117.45..130.08 rows=181 width=104)<br/> Output: impression.server_utc_time, impression.is_timeout_default_impression, impression.capsule_name,<b>impression.website_variables</b>, ((impression.server_utc_time / 3600000) * 3600), regexp_split_to_table(CASEWHEN ("substring"((impres<br /> ion.website_variables)::text, 'ybt=([0-9;]*)'::text) IS NULL) THEN'none'::text WHEN ("substring"((impression.website_variables)::text, 'ybt=([0-9;]*)'::text) = ''::text) THEN 'none'::textELSE "substring"((impression.website_variables)::text, 'ybt=([0-9;<br /> *)'::text) END, ';'::text)<br /> -> Parallel sender HASH-AGG queue (nodenum=4 cost=117.45..130.08 rows=181 width=104)<br /> Output: impression.server_utc_time, impression.is_timeout_default_impression, impression.capsule_name,<b>impression.website_variables</b>, ((impression.server_utc_time / 3600000) * 3600), regexp_split_to_table(CASEWHEN ("substring"((<br /> mpression.website_variables)::text, 'ybt=([0-9;]*)'::text) IS NULL) THEN'none'::text WHEN ("substring"((impression.website_variables)::text, 'ybt=([0-9;]*)'::text) = ''::text) THEN 'none'::textELSE "substring"((impression.website_variables)::text, 'ybt=<br /> [0-9;]*)'::text) END, ';'::text)<br /> <b>-> Hash Left Join (nodenum=5 cost=117.45..130.08 rows=181 width=104)<br /> Output: impression.server_utc_time, impression.is_timeout_default_impression, impression.capsule_name,impression.website_variables, ((impression.server_utc_time / 3600000) * 3600), regexp_split_to_table(CASEWHEN ("substr<br /> ng"((impression.website_variables)::text, 'ybt=([0-9;]*)'::text) IS NULL) THEN'none'::text WHEN ("substring"((impression.website_variables)::text, 'ybt=([0-9;]*)'::text) = ''::text) THEN 'none'::textELSE "substring"((impression.website_variables)::text,</b><br /> 'ybt=([0-9;]*)'::text) END, ';'::text)<br /> -> Parallel reciever HASH-LEFT queue (nodenum=6 cost=0.00..3.10 rows=10 width=136)<br/> Output: impression.server_utc_time, impression.is_timeout_default_impression,impression.capsule_name, impression.website_variables, impression.impression_id<br/> -> Parallel sender HASH-LEFT queue (nodenum=7 cost=0.00..3.10rows=10 width=136)<br /> Output: impression.server_utc_time, impression.is_timeout_default_impression,impression.capsule_name, impression.website_variables, impression.impression_id<br/> -> Seq Scan on impression_ytw2_row impression (nodenum=8 cost=0.00..3.10 rows=10 width=136)<br /> Output:impression.server_utc_time, impression.is_timeout_default_impression, impression.capsule_name, impression.website_variables,impression.impression_id<br /> -> Hash (nodenum=9 cost=72.20..72.20rows=3620 width=32)<br /> Output: click.impression_id<br /> -> Parallel reciever HASH-RIGHT queue (nodenum=10 cost=0.00..72.20 rows=3620width=32)<br /> Output: click.impression_id<br /> -> Parallel sender HASH-RIGHT queue (nodenum=11 cost=0.00..72.20 rows=3620width=32)<br /> Output: click.impression_id<br /> -> Seq Scan on clicks_row click (nodenum=12 cost=0.00..72.20 rows=3620width=32)<br /> Output: click.impression_id<br /><br /><u><b>Case2 { PLan when I disable code to merge view - basically do not call pullup_subqueries - it does the right thing}<br /><br /></b></u> Parallel reciever RANDOM queue (nodenum=0 cost=133.70..137.32 rows=181 width=112)<br /> Output:ooj.local_time, ooj.is_timeout, ooj.capsulename, (), sum((sum((sum(ooj.impression))))), sum((sum((sum(ooj.click)))))<br/> -> Parallel sender RANDOM queue (nodenum=1 cost=133.70..137.32 rows=181 width=112)<br/> Output: ooj.local_time, ooj.is_timeout, ooj.capsulename, (), sum((sum(ooj.impression))), sum((sum(ooj.click)))<br/> -> HashAggregate (nodenum=2 cost=133.70..137.32 rows=181 width=112)<br /> Output: ooj.local_time, ooj.is_timeout, ooj.capsulename, (regexp_split_to_table(CASE WHEN (ooj.isubstr ISNULL) THEN 'none'::text WHEN (ooj.isubstr = ''::text) THEN 'none'::text ELSE ooj.isubstr END, ';'::text)), sum(ooj.impression),sum(oo<br /> .click)<br /> -> Parallel reciever HASH-AGG queue (nodenum=3 cost=117.45..130.98rows=181 width=112)<br /> Output: ooj.local_time, ooj.is_timeout, ooj.capsulename,ooj.isubstr, ooj.impression, ooj.click, regexp_split_to_table(CASE WHEN (ooj.isubstr IS NULL) THEN 'none'::textWHEN (ooj.isubstr = ''::text) THEN 'none'::text ELSE ooj.isubstr <br /> ND, ';'::text)<br /> -> Parallel sender HASH-AGG queue (nodenum=4 cost=117.45..130.98 rows=181 width=112)<br /> Output: ooj.local_time, ooj.is_timeout, ooj.capsulename, ooj.isubstr, ooj.impression, ooj.click,regexp_split_to_table(CASE WHEN (ooj.isubstr IS NULL) THEN 'none'::text WHEN (ooj.isubstr = ''::text) THEN 'none'::textELSE <a href="http://ooj.is">ooj.is</a><br /> bstr END, ';'::text)<br /> -> SubqueryScan ooj (nodenum=5 cost=117.45..130.98 rows=181 width=112)<br /> Output: ooj.local_time,ooj.is_timeout, ooj.capsulename, ooj.isubstr, ooj.impression, ooj.click, regexp_split_to_table(CASE WHEN (ooj.isubstrIS NULL) THEN 'none'::text WHEN (ooj.isubstr = ''::text) THEN 'none'::text ELSE <br /> oj.isubstr END, ';'::text)<br/> <b>-> Hash Left Join (nodenum=6 cost=117.45..128.27 rows=181 width=104)<br/> Output: ((impression.server_utc_time / 3600000) * 3600), impression.is_timeout_default_impression,impression.capsule_name, "substring"((impression.website_variables)::text, 'ybt=([0-9;]*)'::text),1, 1</b><br /> -> Parallel reciever HASH-LEFT queue (nodenum=7cost=0.00..3.10 rows=10 width=136)<br /> Output: impression.server_utc_time,impression.is_timeout_default_impression, impression.capsule_name, impression.website_variables,impression.impression_id<br /> -> Parallelsender HASH-LEFT queue (nodenum=8 cost=0.00..3.10 rows=10 width=136)<br /> Output: impression.server_utc_time, impression.is_timeout_default_impression,impression.capsule_name, impression.website_variables, impression.impression_id<br/> -> Seq Scan on impression_ytw2_row impression (nodenum=9 cost=0.00..3.10 rows=10 width=136)<br /> Output:impression.server_utc_time, impression.is_timeout_default_impression, impression.capsule_name, impression.website_variables,impression.impression_id<br /> -> Hash (nodenum=10cost=72.20..72.20 rows=3620 width=32)<br /> Output: click.impression_id<br/> -> Parallel reciever HASH-RIGHT queue (nodenum=11cost=0.00..72.20 rows=3620 width=32)<br /> Output: click.impression_id<br/> -> Parallel sender HASH-RIGHT queue (nodenum=12cost=0.00..72.20 rows=3620 width=32)<br /> Output: click.impression_id<br/> -> Seq Scan on clicks_row click (nodenum=13cost=0.00..72.20 rows=3620 width=32)<br /> Output:click.impression_id<br /><br /><u><b>More analysis<br /><br /></b></u><ol><li>Looked at code in make_join_rel andbuild_joinrel_tlist I can see in one case the bms_nonempty_difference code kicks in and finds that the column can be prunedout and other case it does not do the right thing.<li>I am still trying to work out why pullup_subquery should makea difference.</ol>Any pointers appreciated .<br /><br />Regards<br />Harmeek<br /><br />2011 at 7:24 AM, Tom Lane <<ahref="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>> wrote:<br /> > HarmeekSingh Bedi <<a href="mailto:harmeeksingh@gmail.com">harmeeksingh@gmail.com</a>>writes:<br />>> In a plan where Node 1 is parent{say join) and Node 2 is child<br />>> (say scan) . If node 1 has a expression say foo(column) then scan<br/> >> will project 'column' for sure and join will<br />>> evaluate foo(column). Now if the nodeabove join does not need<br />>> column ? how does postgress remove the column from join's projection<br /> >>list .<br />><br />> See build_joinrel_tlist() in relnode.c.<br />><br />>> I am seeing that it doesnot in many<br />>> cases specially when sort appears above.<br />><br />> Please show a concrete example.<br/> ><br />> regards, tom lane<br />><br /><br />
pgsql-hackers by date: