Re: JOIN performance - Mailing list pgsql-sql
From | Dean Gibson (DB Administrator) |
---|---|
Subject | Re: JOIN performance |
Date | |
Msg-id | 5.1.0.14.2.20040920164348.00a99868@imaps.mailpen.net Whole thread Raw |
In response to | Re: JOIN performance (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: JOIN performance
|
List | pgsql-sql |
Tom Lane wrote on 2004-09-20 16:06: >"Dean Gibson (DB Administrator)" <postgresql3@ultimeth.com> writes: > > I have a view that when used, is slow: > >... If you want useful help you need to be more complete. I use views to "hide" tables so that I can populate new tables and then atomically switch to them with "CREATE OR REPLACE ...". Here is the same data with the raw tables: => explain select * from "20040920_070010"."_GenLicGroupA4" AS x LEFT JOIN "20040919_070713"."_LicHD" AS y ON x.sys_id = y.unique_system_identifier; QUERY PLAN ------------------------------------------------------------------------------------------------- Merge Join (cost=5235.14..35123.51rows=43680 width=365) Merge Cond: ("outer".unique_system_identifier = "inner".sys_id) -> IndexScan using "_LicHD_pkey" on "_LicHD" y (cost=0.00..27361.79 rows=886799 width=344) -> Sort (cost=5235.14..5344.34 rows=43680 width=21) Sort Key: x.sys_id -> SeqScan on "_GenLicGroupA4" x (cost=0.00..1339.80 rows=43680 width=21) Using first level views, as mentioned above, the results are the same: => explain select * from "Base"."GenLicGroupA4" AS x LEFT JOIN "Base"."LicHD" AS y ON x.sys_id = y.unique_system_identifier; QUERY PLAN ----------------------------------------------------------------------------------------------- Merge Join (cost=5235.14..35123.51rows=43680 width=365) Merge Cond: ("outer".unique_system_identifier = "inner".sys_id) -> IndexScan using "_LicHD_pkey" on "_LicHD" (cost=0.00..27361.79 rows=886799 width=344) -> Sort (cost=5235.14..5344.34 rows=43680 width=21) Sort Key: "_GenLicGroupA4".sys_id -> Seq Scan on "_GenLicGroupA4" (cost=0.00..1339.80 rows=43680 width=21) However, when I introduce a second-level view for the second table of: CREATE VIEW "Data".lic_hd AS SELECT unique_system_identifier AS sys_id, callsign AS callsign, uls_file_number AS uls_file_num, applicant_type_code AS applicant_type, radio_service_code AS radio_service, license_status AS license_status, grant_date AS grant_date, effective_date AS effective_date, cancellation_date AS cancel_date, expired_date AS expire_date, last_action_date AS last_action_date, CASE WHEN cancellation_date < expired_date THEN cancellation_date ELSE expired_date END AS end_date, cancellation_date < expired_date AS canceled FROM "Base"."LicHD"; And then change the query to use it, I get: => explain select * from "Base"."GenLicGroupA4" AS x LEFT JOIN lic_hd AS y ON x.sys_id = y.sys_id; QUERY PLAN ------------------------------------------------------------------------------------ Merge Join (cost=280258.11..289399.92rows=359154 width=98) Merge Cond: ("outer".sys_id = "inner".sys_id) -> Sort (cost=5235.14..5344.34rows=43680 width=21) Sort Key: "_GenLicGroupA4".sys_id -> Seq Scan on "_GenLicGroupA4" (cost=0.00..1339.80 rows=43680 width=21) -> Sort (cost=262032.96..264249.96 rows=886799 width=72) Sort Key: y.sys_id -> Subquery Scany (cost=0.00..24529.99 rows=886799 width=72) -> Seq Scan on "_LicHD" (cost=0.00..24529.99 rows=886799 width=72) Note that the scan on _LicHD is now sequential. If I change the above view to remove the last two columns, I get: QUERY PLAN ------------------------------------------------------------------------------------ Merge Join (cost=5235.14..35123.51rows=43680 width=93) Merge Cond: ("outer".unique_system_identifier = "inner".sys_id) -> IndexScan using "_LicHD_pkey" on "_LicHD" (cost=0.00..27361.79 rows=886799 width=72) -> Sort (cost=5235.14..5344.34 rows=43680 width=21) Sort Key: x.sys_id -> SeqScan on "_GenLicGroupA4" x (cost=0.00..1339.80 rows=43680 width=21) Which is back to my original (good) performance. Question: why do the last two column definitions in the second VIEW change the scan on _LicHD from indexed to sequential ?? -- Dean