Re: Seeking help with a query that takes too long - Mailing list pgsql-performance
From | Nick Fankhauser |
---|---|
Subject | Re: Seeking help with a query that takes too long |
Date | |
Msg-id | NEBBLAAHGLEEPCGOBHDGMEJKJGAA.nickf@ontko.com Whole thread Raw |
In response to | Re: Seeking help with a query that takes too long (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Seeking help with a query that takes too long
|
List | pgsql-performance |
> It looks like you are running with the default statistics target (10). > Try boosting it to 100 or even more for this column (see ALTER TABLE > SET STATISTICS, then re-ANALYZE) and see if the estimate gets better. Here are the results & a few more clues: prod1=# alter table actor alter column actor_full_name_uppercase set statistics 1000; ALTER TABLE prod1=# analyze actor; ANALYZE prod1=# select count(distinct actor_full_name_uppercase) from actor; count --------- 1453371 (1 row) prod1=# select count(actor_id) from actor; count --------- 3386359 (1 row) This indicates to me that 1 isn't too shabby as an estimate if the whole name is specified, but I'm not sure how this gets altered in the case of a "LIKE" prod1=# \x Expanded display is on. prod1=# SELECT * FROM pg_stats prod1-# WHERE tablename='actor' AND attname='actor_full_name_uppercase'; <Header boilerplate snipped out> schemaname | public tablename | actor attname | actor_full_name_uppercase null_frac | 0.000586667 avg_width | 21 n_distinct | -0.14701 <Long list of values and frequencies snipped out> correlation | -0.00211291 Question: What does it mean when n_distinct is negative? New results of explain analyze: QUERY PLAN ---------------------------------------------------------------------------- ---------------------------------------------------------------------------- -------------------------------------------------- Limit (cost=252683.61..252683.68 rows=28 width=116) (actual time=169377.32..169378.39 rows=1000 loops=1) -> Sort (cost=252683.61..252683.68 rows=29 width=116) (actual time=169377.31..169377.69 rows=1001 loops=1) Sort Key: max((actor.actor_full_name)::text), count(case_data.case_id) -> Aggregate (cost=252678.57..252682.91 rows=29 width=116) (actual time=169305.79..169354.50 rows=3456 loops=1) -> Group (cost=252678.57..252680.01 rows=289 width=116) (actual time=169305.76..169330.00 rows=5879 loops=1) -> Sort (cost=252678.57..252679.29 rows=289 width=116) (actual time=169305.75..169308.15 rows=5879 loops=1) Sort Key: actor.actor_id -> Nested Loop (cost=0.00..252666.74 rows=289 width=116) (actual time=89.27..169273.51 rows=5879 loops=1) -> Nested Loop (cost=0.00..251608.11 rows=289 width=77) (actual time=57.73..92753.49 rows=5882 loops=1) -> Index Scan using actor_full_name_uppercase on actor (cost=0.00..456.88 rows=113 width=42) (actual time=32.80..3197.28 rows=3501 loops=1) Index Cond: ((actor_full_name_uppercase >= 'SANDERS'::character varying) AND (actor_full_name_uppercase < 'SANDERT'::character varying)) Filter: (actor_full_name_uppercase ~~ 'SANDERS%'::text) -> Index Scan using actor_case_assignment_actor_id on actor_case_assignment (cost=0.00..2181.29 rows=2616 width=35) (actual time=22.26..25.57 rows=2 loops=3501) Index Cond: ("outer".actor_id = actor_case_assignment.actor_id) -> Index Scan using case_data_case_id on case_data (cost=0.00..3.65 rows=1 width=39) (actual time=13.00..13.00 rows=1 loops=5882) Index Cond: (case_data.case_id = "outer".case_id) Total runtime: 169381.38 msec (17 rows)
pgsql-performance by date: