Thread: Postgres Planner "Inconsistency"?

Postgres Planner "Inconsistency"?

From

Renzo Bertuzzi

Date:

02 February 2018, 18:37:26

Can you help me with this situation please??

I have a strange problem with a query where the planner only uses and index if I use a constant value, but if I use a subquery it will prefer a seq scan.

I have table "sample_table" with columns id serial primary key, and int_flag, with an index on int_flag.

I inserted 240387 values with int_flag=1 and 1 value with int_flag=2

so the table has 240388 total rows, the last row of the table has int_flag=2

If I execute this query, the planner chooses the index:

explain (analyze ,verbose,buffers)

SELECT id

FROM sample_table

WHERE

int_flag = any((array[2])::int[])

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Index Scan using index_sample_table_int_flag_ix on public.sample_table (cost=0.42..39.86 rows=1 width=8) (actual time=0.023..0.023 rows=1 loops=1)

Output: id

Index Cond: (sample_table.int_flag = ANY ('{2}'::integer[]))

Buffers: shared hit=28

Planning time: 0.087 ms

Execution time: 0.046 ms

but if I slightly change the query to:

explain (analyze ,verbose,buffers)

SELECT id

FROM sample_table

WHERE

int_flag = any((select array[2])::int[])

now postgres will do a seq scan.

I have run vacuum and analyze but the result is the same.

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------

Seq Scan on public.sample_table (cost=0.01..8843.74 rows=240388 width=8) (actual time=44.993..44.995 rows=1 loops=1)

Output: id

Filter: (sample_table.int_flag = ANY ($0))

Rows Removed by Filter: 240387

Buffers: shared hit=3435

InitPlan 1 (returns $0)

-> Result (cost=0.00..0.01 rows=1 width=32) (actual time=0.002..0.003 rows=1 loops=1)

Output: '{2}'::integer[]

Planning time: 0.092 ms

Execution time: 45.017 ms

I have created a SQL Fiddle to demonstrate the issue:

http://sqlfiddle.com/#!17/5be43/5

I suppose postgres prefers a seq scan because it treats the subquery as a non-deterministic value while in the first case the planner has all the values before hand???

I'm using PostgreSQL 9.6.6 on x86_64-pc-linux-gnu, compiled by gcc (Debian 4.9.2-10) 4.9.2, 64-bit

PS: This is a simplified query, the actual query will use another small table to build the array with less than 100 values and sample_table can have up to 5 millions entries. I have tried using a CTE with the array, but it still will do a seq scan.

cheers

Re: Postgres Planner "Inconsistency"?

From

Laurenz Albe

Date:

03 February 2018, 08:27:35

Renzo Bertuzzi wrote:
> I have a strange problem with a query where the planner only uses and index
> if I use a constant value, but if I use a subquery it will prefer a seq scan.
> 
> I have table "sample_table" with columns id serial primary key, and int_flag, with an index on int_flag.
> I inserted 240387 values with int_flag=1 and 1 value with int_flag=2
> 
> so the table has 240388 total rows, the last row of the table has int_flag=2
> 
> If I execute this query, the planner chooses the index:
> 
> explain (analyze ,verbose,buffers)
> SELECT id
> FROM sample_table
> WHERE              
>   int_flag = any((array[2])::int[])
> 
> QUERY PLAN
                                                                  
 
>
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

 
> Index Scan using index_sample_table_int_flag_ix on public.sample_table  (cost=0.42..39.86 rows=1 width=8) (actual
time=0.023..0.023rows=1 loops=1)  
 
>   Output: id
                                                                  
 
>   Index Cond: (sample_table.int_flag = ANY ('{2}'::integer[]))
                                                
 
>   Buffers: shared hit=28
                                                                  
 
> Planning time: 0.087 ms
                                                                  
 
> Execution time: 0.046 ms   
> 
> 
> but if I slightly change the query to:
> 
> explain (analyze ,verbose,buffers)
> SELECT id
> FROM sample_table
> WHERE              
>   int_flag = any((select array[2])::int[])
> 
> now postgres will do a seq scan.
> I have run vacuum and analyze but the result is the same.
> 
> QUERY PLAN
                  
 
>
---------------------------------------------------------------------------------------------------------------------------------------

 
> Seq Scan on public.sample_table  (cost=0.01..8843.74 rows=240388 width=8) (actual time=44.993..44.995 rows=1 loops=1)

 
>   Output: id
                  
 
>   Filter: (sample_table.int_flag = ANY ($0))
                         
 
>   Rows Removed by Filter: 240387
                  
 
>   Buffers: shared hit=3435
                  
 
>   InitPlan 1 (returns $0)
                  
 
>     ->  Result  (cost=0.00..0.01 rows=1 width=32) (actual time=0.002..0.003 rows=1 loops=1)
                  
 
>           Output: '{2}'::integer[]                                                                              
> Planning time: 0.092 ms
                  
 
> Execution time: 45.017 ms  
> 
> I suppose postgres prefers a seq scan because it treats the subquery as a non-deterministic
> value while in the first case the planner has all the values before hand???
> 
> I'm using PostgreSQL 9.6.6 on x86_64-pc-linux-gnu, compiled by gcc (Debian 4.9.2-10) 4.9.2, 64-bit
> 
> PS: This is a simplified query, the actual query will use another small table to build the array
> with less than 100 values and sample_table can have up to 5 millions entries.
> I have tried using a CTE with the array, but it still will do a seq scan.

In the second case, the optimizer does not think hard enough to figure out
that it actually could know that the InitPlan has a result of 2, and with
your real query it probably couldn't know for sure even if it tried hard.

So it has to come up with a plan without knowing what the search values will
be, and it chooses a sequential scan as the lesser evil, since it guesses
that it will have to retrieve most of the tuples anyway.

Maybe you can write your query as a join instead.

Yours,
Laurenz Albe