Re: questions on interpreting the output of explain - Mailing list pgsql-sql
From | Tom Lane |
---|---|
Subject | Re: questions on interpreting the output of explain |
Date | |
Msg-id | 1430.919629383@sss.pgh.pa.us Whole thread Raw |
In response to | questions on interpreting the output of explain (Michael Olivier <molivier@yahoo.com>) |
Responses |
Re: [SQL] Re: questions on interpreting the output of explain\
|
List | pgsql-sql |
Michael Olivier <molivier@yahoo.com> writes: > Can someone explain the details here to me? I guess specific questions > are: > - - Does hash join mean my acctname indexes are hashed, not btree'd? > - - What does index scan mean? > - - Nested loop? OK, I'll take a cut at this. (I've been studying the Postgres optimizer lately, so I know a little bit about this, but I may be wrong about some of the details.) To Postgres, the most interesting part of your query is the WHERE clause "U.acctname = S.acctname". This makes it a "join query": the system has to find all combinations of U rows and S rows such that U.acctname = S.acctname. The other clauses of your WHERE specification are "restriction clauses": they eliminate rows from consideration. Postgres has two basic ways of scanning an individual table to find the desired rows: it can do a sequential scan of the whole table, or it can scan an index and then go to the table proper for each row that looks promising according to the index. Restriction clauses that reference only the indexed field can be checked just on the basis of the index entry, so it's not necessary to fetch the full row if it can be rejected via a restriction clause. Index scan is faster than sequential if only a few rows need be retrieved, but slower if many rows must be fetched. (With some index types, such as btree, an index scan has another useful property: the retrieved rows are inherently ordered by the indexed fields. So an index scan might save an explicit sort step.) OK, now what about joining multiple tables? The system knows three ways to join two tables: * Nested loop: scan one table, and for each candidate row retrieved from it, scan the entire other table looking for matching rows. This is pretty slow for big tables, but is the fallback if the system can't figure out how to apply any faster technique. Note that the "scans" might be implemented as either sequential or index scans, depending on the availability of indexes matching restriction clauses; so even a nested loop plan is not necessarily completely brute-force. * Merge join: this is only useful when the join clause is an equality check, a.f1 = b.f2. The idea is to scan both tables in parallel, in sequence by the joined field (f1 for a, f2 for b). Matching rows can be matched up on-the-fly using the same kind of logic you'd use for merging two ordered lists. The trick is to scan the tables in the right sorted order; that can be done with an index scan over an ordered index, or by a sequential scan and an explicit sort step. * Hash join: scan the smaller table once, entering each of its rows (after restriction checks) into a temporary hashtable keyed by the join field. Then scan the larger table once, probing into the hashtable to find all matching rows. This is pretty good if one of the tables is small enough to fit in memory conveniently, or if you expect the restriction clauses to eliminate all but a few of its rows. Again, the scans might be either sequential or indexed depending on restriction clauses and whether the final output needs to be sorted or not. If you have more than two tables mentioned in a query, they're joined together in a sequence of pairwise join operations that produce temporary tables. Obviously there are a lot of possible ways to do any complicated query, and the interactions aren't obvious. (For example, doing a lower-level join in a way that produces a sorted result might make it possible to apply a mergejoin later without a separate sort, yielding a cheaper overall result even though the first join takes longer.) The system uses statistics it keeps about table sizes and index properties to try to estimate the "cost" of each possible way of doing the query, and then it picks the way with the lowest cost estimate. Of course these estimates aren't 100% accurate, especially not if the statistics are out of date. (That's why you should run VACUUM ANALYZE regularly.) > - - Does the cost roll-up, meaning the top line is total cost, and the > rest is a breakdown, or do I add up all the cost numbers? > - - Can I compare costs between two queries? It rolls up, and comparing costs is what the whole thing is all about. EXPLAIN will only show you the plan estimated to be cheapest overall. If you'd like to see what was considered and rejected, you can turn on optimizer debug printouts at system build time (but beware: they are verbose). > I get: > Nested Loop (cost=9.38 size=1 width=28) -> Index Scan on u (cost=3.26 size=3 width=16) -> Index Scan on s (cost=2.04 size=25 width=12) > ...but if I shorten the query slightly > [ by omitting restriction clauses ] > ...I get: > Hash Join (cost=14.67 size=3 width=28) -> Index Scan on u (cost=3.26 size=26 width=16) -> Hash (cost=0.00 size=0 width=0) -> Index Scan on s (cost=7.73 size=25 width=12) I'm guessing that the extra restriction clauses made a nested loop look better. The index scans in the first plan are probably chosen to allow restriction clauses to be tested before fetching rows. (BTW, recent releases include the name of the selected index in EXPLAIN output, which makes it a good deal easier to tell what's going on.) With the extra restrictions, the system estimated that few enough rows would be selected that a nested loop would be cheapest. Note the "size" fields: the system is guessing that only 3 U rows will be left after restriction in the first case, so it will only have to repeat the inner scan 3 times. In the second case it guesses that 26 U rows will have to be joined, so the extra overhead of preparing a hashtable is worth it. These guesses could be all wet of course, but that's how the logic works. A generalization of that comment is that on a small test table, you're likely to get a nested loop plan no matter what. The startup overhead of the more complex join types isn't worth it until the tables get to realistic sizes (say a few hundred tuples) ... and the system's cost models are good enough to know that. regards, tom lane