Potential Join Performance Issue - Mailing list pgsql-hackers
From | Lawrence, Ramon |
---|---|
Subject | Potential Join Performance Issue |
Date | |
Msg-id | 6EEA43D22289484890D119821101B1DF2C1563@exchange20.mercury.ad.ubc.ca Whole thread Raw |
Responses |
Re: Potential Join Performance Issue
Re: Potential Join Performance Issue |
List | pgsql-hackers |
<div class="Section1"><p class="MsoNormal"><font face="Arial" size="2"><span style="font-size:10.0pt; font-family:Arial">PostgreSQL development community:</span></font><p class="MsoNormal"><font face="Arial" size="2"><spanstyle="font-size:10.0pt; font-family:Arial"> </span></font><p class="MsoNormal"><font face="Arial" size="2"><span style="font-size:10.0pt; font-family:Arial">Our research group has been using the PostgreSQL code base to test new join algorithms. During testing,we noticed that the planner is not pushing down projections to the outer relation in a hash join. Although thismakes sense for in-memory (1 batch) joins, for joins larger than memory (such as for TPC-H DSS), this causes the systemto perform significantly more disk I/Os when reading/writing batches of the outer relation.</span></font><p class="MsoNormal"><fontface="Arial" size="2"><span style="font-size:10.0pt; font-family:Arial"> </span></font><p class="MsoNormal"><font face="Arial" size="2"><span style="font-size:10.0pt; font-family:Arial">A simple solution is to add a single line of code to src\backend\optimizer\plan\createplan.c after line1771:</span></font><p class="MsoNormal"><font face="Arial" size="2"><span style="font-size:10.0pt; font-family:Arial"> </span></font><p class="MsoNormal"><font face="Courier New" size="2"><span style="font-size:10.0pt; font-family:"Courier New"">disuse_physical_tlist(outer_plan, best_path->jpath.outerjoinpath);</span></font><p class="MsoNormal"><fontface="Courier New" size="2"><span style="font-size:10.0pt; font-family:"Courier New""> </span></font><p class="MsoNormal"><font face="Courier New" size="2"><span style="font-size:10.0pt; font-family:"Courier New"">This will always force the projection on the outer relation.</span></font><p class="MsoNormal"><fontface="Courier New" size="2"><span style="font-size:10.0pt; font-family:"Courier New""> </span></font><p class="MsoNormal"><font face="Courier New" size="2"><span style="font-size:10.0pt; font-family:"Courier New"">A more complicated modification alternative is to add a state variable to allow the planner toknow how many batches the hash join expects and only push down the projection if it is greater than one. However, pushingthe projection on the outer relation is almost always the best choice as it eliminates unneeded attributes for operatorsabove the hash join in the plan and will be robust in the case of poor estimates.</span></font><p class="MsoNormal"><fontface="Courier New" size="2"><span style="font-size:10.0pt; font-family:"Courier New""> </span></font><p class="MsoNormal"><font face="Courier New" size="2"><span style="font-size:10.0pt; font-family:"Courier New"">We have been testing using TPC-H scale factor 1 GB. A sample query that demonstrates the behavioris:</span></font><p class="MsoNormal"><font face="Courier New" size="2"><span style="font-size:10.0pt; font-family:"Courier New""> </span></font><p class="MsoNormal"><font face="Courier New" size="2"><span style="font-size:10.0pt; font-family:"Courier New"">SELECT c_custkey, c_name, o_orderkey, o_orderdate</span></font><p class="MsoNormal"><font face="CourierNew" size="2"><span style="font-size:10.0pt; font-family:"Courier New"">FROM Customer, Orders</span></font><p class="MsoNormal"><font face="Courier New" size="2"><spanstyle="font-size:10.0pt; font-family:"Courier New"">WHERE c_custkey = o_custkey</span></font><p class="MsoNormal"><font face="Courier New" size="2"><spanstyle="font-size:10.0pt; font-family:"Courier New""> </span></font><p class="MsoNormal"><font face="Courier New" size="2"><span style="font-size:10.0pt; font-family:"Courier New"">Note that EXPLAIN on this query will indicate that the projection is performed on the outer relationeven though it is not done. We found the difference by modifying our code to track tuples and bytes output to disk,but it also can be detected by watching the size of the temporary files produced during the join.</span></font><fontface="Arial" size="2"><span style="font-size:10.0pt;font-family:Arial"></span></font><p class="MsoNormal"><fontface="Arial" size="2"><span style="font-size:10.0pt; font-family:Arial"> </span></font><p class="MsoNormal"><font face="Arial" size="2"><span style="font-size:10.0pt; font-family:Arial">Sincerely,</span></font><p class="MsoAutoSig"><font face="Times New Roman" size="3"><span style="font-size: 12.0pt"> </span></font><p class="MsoAutoSig"><font face="Times New Roman" size="3"><span style="font-size: 12.0pt">Dr. Ramon Lawrence</span></font><p class="MsoAutoSig"><font face="Times New Roman" size="3"><span style="font-size: 12.0pt">Assistant Professor, Department of Computer Science, University of British Columbia Okanagan</span></font><p class="MsoAutoSig"><fontface="Times New Roman" size="3"><span style="font-size: 12.0pt"><a href="http://people.ok.ubc.ca/rlawrenc/">http://people.ok.ubc.ca/rlawrenc/</a></span></font><p class="MsoAutoSig"><fontface="Times New Roman" size="3"><span lang="PT-BR" style="font-size:12.0pt">E-mail: </span><a href="mailto:ramon.lawrence@ubc.ca"><spanlang="PT-BR">ramon.lawrence@ubc.ca</span></a></font><span lang="PT-BR"></span><pclass="MsoNormal"><font face="Times New Roman" size="3"><span lang="PT-BR" style="font-size:12.0pt"> </span></font></div>
pgsql-hackers by date: