Re: Problem with n_distinct being consistently inaccurate. - Mailing list pgsql-admin
From | Nick Fankhauser |
---|---|
Subject | Re: Problem with n_distinct being consistently inaccurate. |
Date | |
Msg-id | NEBBLAAHGLEEPCGOBHDGMEJCIJAA.nickf@ontko.com Whole thread Raw |
In response to | Re: Problem with n_distinct being consistently inaccurate. (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Problem with n_distinct being consistently inaccurate.
|
List | pgsql-admin |
> Just out of curiosity, what happens if you make it bigger than 92k? > Does a value 10x or 100x reality change the plan? Neither one makes a change- perhaps something else is at work here- my understanding of the finer points of query plans is shaky- Here is the query and the plan I'm getting: alpha-# event.event_date_time AS start_time, alpha-# event.event_duration_minutes AS duration_minutes, alpha-# event.event_ical_status AS status, alpha-# event.event_location_name AS "location", alpha-# event.event_room, event.event_type_code AS category, alpha-# event.event_hearing_type_desc AS sub_category, alpha-# event.event_summary AS summary, alpha-# event.event_pk, alpha-# event.event_revision_number, alpha-# case_data.case_public_id, alpha-# case_data.case_title, alpha-# court_config.court_name AS court, alpha-# event.event_id, alpha-# NULL::"unknown" AS related_event_id, alpha-# case_data.case_id, alpha-# court_config.court_id, alpha-# actor_case_assignment.actor_id, alpha-# actor_identifier.actor_identifier_text AS actor_identifier, alpha-# actor_identifier.actor_identifier_type alpha-# FROM alpha-# actor_identifier, alpha-# actor_case_assignment, alpha-# case_data, event, alpha-# court_config alpha-# WHERE alpha-# ( alpha(# ( alpha(# ( alpha(# (actor_identifier.actor_id = actor_case_assignment.actor_id) alpha(# AND alpha(# (actor_case_assignment.case_id = case_data.case_id) alpha(# ) alpha(# AND alpha(# (case_data.case_id = event.case_id) alpha(# ) alpha(# AND alpha(# (case_data.court_id = court_config.court_id) alpha(# ) alpha-# and actor_identifier.actor_identifier_text='07314-20' alpha-# and actor_identifier.actor_identifier_type = 'AttorneyStateBarID' alpha-# and event_date_time >= '06/01/2003' alpha-# and event_date_time <= '06/30/2003'; Hash Join (cost=2702.10..2703.83 rows=1 width=510) Hash Cond: ("outer".court_id = "inner".court_id) -> Seq Scan on court_config (cost=0.00..1.48 rows=48 width=39) -> Hash (cost=2702.10..2702.10 rows=1 width=471) -> Nested Loop (cost=0.00..2702.10 rows=1 width=471) Join Filter: ("outer".case_id = "inner".case_id) -> Nested Loop (cost=0.00..2698.10 rows=1 width=397) -> Nested Loop (cost=0.00..2602.13 rows=13 width=90) -> Index Scan using actor_identifier_actor_text on actor_identifier (cost=0.00..6.63 rows=1 width=55) Index Cond: ((actor_identifier_text = '07314-20'::character varying) AND (actor_identifier_type = 'AttorneyStateBarID'::character varying)) -> Index Scan using actor_case_assignment_actor_id on actor_case_assignment (cost=0.00..2229.73 rows=1758 width=35) Index Cond: ("outer".actor_id = actor_case_assignment.actor_id) -> Index Scan using event_case_id on event (cost=0.00..7.46 rows=1 width=307) Index Cond: (event.case_id = "outer".case_id) Filter: ((event_date_time >= '2003-06-01 00:00:00-05'::timestamp with time zone) AND (event_date_time <= '2003-06-30 00:00:00-05'::timestamp with time zone)) -> Index Scan using case_data_case_id on case_data (cost=0.00..3.98 rows=1 width=74) Index Cond: (case_data.case_id = "outer".case_id) (17 rows) What I'm trying to avoid is the Filter on event_date_time. It seems to me that an index condition could be used to advantage here, and that if this table "drove" the rest of the plan, it should work nicely. What got me headed down this path is that if I make this particular table much smaller by eliminating all events in the past, the performance on the query becomes very good (Although Filter is still used). Then I looked at the event_date_time field & thought that it looked pretty selective. Am i coorect in thinking that Filter means that the index is not being used? My guess is that the problem is that although I know the values are evenly distributed over 10 years, the planner has no way of knowing that all of the events don't occur in the month I've specified.
pgsql-admin by date: