Thread: Adding an "and is not null" on an indexed field slows the query down immensely.

I have this query it runs reasonably quickly.


SELECT "consolidated_urls".* FROM "consolidated_urls" INNER JOIN "topical_urls"
 ON "consolidated_urls".id = "topical_urls".consolidated_url_id
 WHERE (("topical_urls".domain_id = 157) AND ((topical_urls.hidden = 'f')))
  ORDER BY index_delta DESC LIMIT 10

The explain for this goes like this.

"Limit  (cost=29899.43..29899.46 rows=10 width=1880)"
"  ->  Sort  (cost=29899.43..29906.04 rows=2642 width=1880)"
"        Sort Key: consolidated_urls.index_delta"
"        ->  Nested Loop  (cost=101.29..29842.34 rows=2642 width=1880)"
"              ->  Bitmap Heap Scan on topical_urls
(cost=101.29..7490.32 rows=2642 width=4)"
"                    Recheck Cond: (domain_id = 157)"
"                    Filter: (NOT hidden)"
"                    ->  Bitmap Index Scan on
index_topical_urls_on_domain_id_and_consolidated_url_id
(cost=0.00..100.63 rows=2643 width=0)"
"                          Index Cond: (domain_id = 157)"
"              ->  Index Scan using consolidated_urls_pkey on
consolidated_urls  (cost=0.00..8.45 rows=1 width=1880)"
"                    Index Cond: (consolidated_urls.id =
topical_urls.consolidated_url_id)"


I add one more clause on to it to filter out index_deltas that are not
null and the query becomes unusably slow.

SELECT "consolidated_urls".* FROM "consolidated_urls" INNER JOIN "topical_urls"
 ON "consolidated_urls".id = "topical_urls".consolidated_url_id
 WHERE (("topical_urls".domain_id = 157) AND ((topical_urls.hidden = 'f')))
 AND (index_delta IS NOT NULL) ORDER BY index_delta DESC LIMIT 10

The explain for this is goes like this

"Limit  (cost=0.00..20555.33 rows=10 width=1880)"
"  ->  Nested Loop  (cost=0.00..5430717.58 rows=2642 width=1880)"
"        ->  Index Scan Backward using
index_consolidateds_url_on_index_delta on consolidated_urls
(cost=0.00..5316175.98 rows=15242 width=1880)"
"              Filter: (index_delta IS NOT NULL)"
"        ->  Index Scan using
index_topical_urls_on_domain_id_and_consolidated_url_id on
topical_urls  (cost=0.00..7.50 rows=1 width=4)"
"              Index Cond: ((topical_urls.domain_id = 157) AND
(topical_urls.consolidated_url_id = consolidated_urls.id))"
"              Filter: (NOT topical_urls.hidden)"



The index_delta field is double precision and is indexed.

Any suggestions as to how to make this query run faster?

Re: Adding an "and is not null" on an indexed field slows the query down immensely.

From
pasman pasmański
Date:
Try :
order by index_delta+1 desc

On 1/15/11, Tim Uckun <timuckun@gmail.com> wrote:
> I have this query it runs reasonably quickly.
>
>
> SELECT "consolidated_urls".* FROM "consolidated_urls" INNER JOIN
> "topical_urls"
>  ON "consolidated_urls".id = "topical_urls".consolidated_url_id
>  WHERE (("topical_urls".domain_id = 157) AND ((topical_urls.hidden = 'f')))
>   ORDER BY index_delta DESC LIMIT 10
>
> The explain for this goes like this.
>
> "Limit  (cost=29899.43..29899.46 rows=10 width=1880)"
> "  ->  Sort  (cost=29899.43..29906.04 rows=2642 width=1880)"
> "        Sort Key: consolidated_urls.index_delta"
> "        ->  Nested Loop  (cost=101.29..29842.34 rows=2642 width=1880)"
> "              ->  Bitmap Heap Scan on topical_urls
> (cost=101.29..7490.32 rows=2642 width=4)"
> "                    Recheck Cond: (domain_id = 157)"
> "                    Filter: (NOT hidden)"
> "                    ->  Bitmap Index Scan on
> index_topical_urls_on_domain_id_and_consolidated_url_id
> (cost=0.00..100.63 rows=2643 width=0)"
> "                          Index Cond: (domain_id = 157)"
> "              ->  Index Scan using consolidated_urls_pkey on
> consolidated_urls  (cost=0.00..8.45 rows=1 width=1880)"
> "                    Index Cond: (consolidated_urls.id =
> topical_urls.consolidated_url_id)"
>
>
> I add one more clause on to it to filter out index_deltas that are not
> null and the query becomes unusably slow.
>
> SELECT "consolidated_urls".* FROM "consolidated_urls" INNER JOIN
> "topical_urls"
>  ON "consolidated_urls".id = "topical_urls".consolidated_url_id
>  WHERE (("topical_urls".domain_id = 157) AND ((topical_urls.hidden = 'f')))
>  AND (index_delta IS NOT NULL) ORDER BY index_delta DESC LIMIT 10
>
> The explain for this is goes like this
>
> "Limit  (cost=0.00..20555.33 rows=10 width=1880)"
> "  ->  Nested Loop  (cost=0.00..5430717.58 rows=2642 width=1880)"
> "        ->  Index Scan Backward using
> index_consolidateds_url_on_index_delta on consolidated_urls
> (cost=0.00..5316175.98 rows=15242 width=1880)"
> "              Filter: (index_delta IS NOT NULL)"
> "        ->  Index Scan using
> index_topical_urls_on_domain_id_and_consolidated_url_id on
> topical_urls  (cost=0.00..7.50 rows=1 width=4)"
> "              Index Cond: ((topical_urls.domain_id = 157) AND
> (topical_urls.consolidated_url_id = consolidated_urls.id))"
> "              Filter: (NOT topical_urls.hidden)"
>
>
>
> The index_delta field is double precision and is indexed.
>
> Any suggestions as to how to make this query run faster?
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

--
Sent from my mobile device

------------
pasman

2011/1/15 pasman pasmański <pasman.p@gmail.com>:
> Try :
> order by index_delta+1 desc
>

I have attached the explain analyze for that below why does this
return instantly?




Limit  (cost=29910.05..29910.07 rows=10 width=1880) (actual
time=42.563..42.563 rows=0 loops=1)
   ->  Sort  (cost=29910.05..29916.65 rows=2642 width=1880) (actual
time=42.558..42.558 rows=0 loops=1)
         Sort Key: ((consolidated_urls.index_delta + 1::double precision))
         Sort Method:  quicksort  Memory: 17kB
         ->  Nested Loop  (cost=105.30..29852.95 rows=2642 width=1880)
(actual time=10.428..10.428 rows=0 loops=1)
               ->  Bitmap Heap Scan on topical_urls
(cost=105.30..7494.33 rows=2642 width=4) (actual time=10.424..10.424
rows=0 loops=1)
                     Recheck Cond: (domain_id = 157)
                     Filter: (NOT hidden)
                     ->  Bitmap Index Scan on
index_topical_urls_on_domain_id_and_consolidated_url_id
(cost=0.00..104.64 rows=2643 width=0) (actual time=10.419..10.419
rows=0 loops=1)
                           Index Cond: (domain_id = 157)
               ->  Index Scan using consolidated_urls_pkey on
consolidated_urls  (cost=0.00..8.45 rows=1 width=1880) (never
executed)
                     Index Cond: (consolidated_urls.id =
topical_urls.consolidated_url_id)
                     Filter: (consolidated_urls.index_delta IS NOT NULL)
 Total runtime: 42.932 ms
(14 rows)

Re: Adding an "and is not null" on an indexed field slows the query down immensely.

From
pasman pasmański
Date:
I think this is a planner's bug. Can you send these explains to pgsql-bugs ?

On 1/15/11, Tim Uckun <timuckun@gmail.com> wrote:
> 2011/1/15 pasman pasmański <pasman.p@gmail.com>:
>> Try :
>> order by index_delta+1 desc
>>
>
> I have attached the explain analyze for that below why does this
> return instantly?
>
>
>
>
> Limit  (cost=29910.05..29910.07 rows=10 width=1880) (actual
> time=42.563..42.563 rows=0 loops=1)
>    ->  Sort  (cost=29910.05..29916.65 rows=2642 width=1880) (actual
> time=42.558..42.558 rows=0 loops=1)
>          Sort Key: ((consolidated_urls.index_delta + 1::double precision))
>          Sort Method:  quicksort  Memory: 17kB
>          ->  Nested Loop  (cost=105.30..29852.95 rows=2642 width=1880)
> (actual time=10.428..10.428 rows=0 loops=1)
>                ->  Bitmap Heap Scan on topical_urls
> (cost=105.30..7494.33 rows=2642 width=4) (actual time=10.424..10.424
> rows=0 loops=1)
>                      Recheck Cond: (domain_id = 157)
>                      Filter: (NOT hidden)
>                      ->  Bitmap Index Scan on
> index_topical_urls_on_domain_id_and_consolidated_url_id
> (cost=0.00..104.64 rows=2643 width=0) (actual time=10.419..10.419
> rows=0 loops=1)
>                            Index Cond: (domain_id = 157)
>                ->  Index Scan using consolidated_urls_pkey on
> consolidated_urls  (cost=0.00..8.45 rows=1 width=1880) (never
> executed)
>                      Index Cond: (consolidated_urls.id =
> topical_urls.consolidated_url_id)
>                      Filter: (consolidated_urls.index_delta IS NOT NULL)
>  Total runtime: 42.932 ms
> (14 rows)
>

--
Sent from my mobile device

------------
pasman

2011/1/16 pasman pasmański <pasman.p@gmail.com>:
> I think this is a planner's bug. Can you send these explains to pgsql-bugs ?
>



Sure. BTW I thought I would change the query a little by putting a AND
index_value .>100 instead of index_delta and it didn't help at all. I
thought maybe using another index would help but it didn't.

So what I did was add NULLS LAST which was more bearable.

This really should be a pretty quick query, there are only twenty
records after all and all criteria fields are indexed.