Thread: [GENERAL] Slow index scan - Pgsql 9.2

[GENERAL] Slow index scan - Pgsql 9.2

From

Patrick B

Date:

10 January 2017, 04:06:34

Hi guys,

I've got the following Query:

WITH
query_p AS (
SELECT CAST(6667176 AS BIGINT) AS client_id),

clients AS (
SELECT
client.id,client.job_share_mode
FROM
customers AS client
WHERE
(client.clientid = (SELECT qp.client_id FROM query_p AS qp))
AND
NOT client.is_demo
AND
NOT client.deleted
)
Select qp.client_id, (SELECT COUNT(0) FROM customers AS c WHERE (c.clientid = qp.client_id) AND NOT c.deleted) AS client_count
FROM query_p AS qp

Explain Analyze:

CTE Scan on "query_p" "qp" (cost=0.01..1060.57 rows=1 width=8) (actual time=4065.244..4065.246 rows=1 loops=1)
CTE query_p
-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.002..0.003 rows=1 loops=1)
SubPlan 2
-> Aggregate (cost=1060.53..1060.54 rows=1 width=0) (actual time=4065.229..4065.229 rows=1 loops=1)
-> Index Scan using "clientid_customers" on "customers" "c" (cost=0.00..1059.01 rows=607 width=0) (actual time=9.105..4063.728 rows=2513 loops=1)
Index Cond: ("clientid" = "qp"."client_id")
Filter: (NOT "deleted")
Rows Removed by Filter: 1068
Total runtime: 4075.753 ms

Why a search for "client_id" is so slow??

Table customers:

Table "public.customers"
Column | Type | Modifiers
------------------------+-----------------------------+-----------------------------------------------------------------
id | bigint | not null default "nextval"('"customers_seq"'::"regclass")
clientid | bigint | not null default 0
name_first | character varying(80) | default ''::character varying
name_last | character varying(80) | default ''::character varying
company | character varying(255) | default ''::character varying

Index clientid_customers:

CREATE INDEX
clientid_customers
ON
customers
(
"clientid"
);

Thanks!

Patrick

Re: [GENERAL] Slow index scan - Pgsql 9.2

From

Jan de Visser

Date:

10 January 2017, 05:30:39

Hi guys,

I've got the following Query:
WITH
query_p AS (
SELECT CAST(6667176 AS BIGINT) AS client_id),

clients AS (
SELECT
client.id,client.job_share_mode
FROM
customers AS client
WHERE
(client.clientid = (SELECT qp.client_id FROM query_p AS qp))
AND
NOT client.is_demo
AND
NOT client.deleted
)
Select qp.client_id, (SELECT COUNT(0) FROM customers AS c WHERE (c.clientid = qp.client_id) AND NOT c.deleted) AS client_count
FROM query_p AS qp

Explain Analyze:
CTE Scan on "query_p" "qp" (cost=0.01..1060.57 rows=1 width=8) (actual time=4065.244..4065.246 rows=1 loops=1)
CTE query_p
-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.002..0.003 rows=1 loops=1)
SubPlan 2
-> Aggregate (cost=1060.53..1060.54 rows=1 width=0) (actual time=4065.229..4065.229 rows=1 loops=1)
-> Index Scan using "clientid_customers" on "customers" "c" (cost=0.00..1059.01 rows=607 width=0) (actual time=9.105..4063.728 rows=2513 loops=1)
Index Cond: ("clientid" = "qp"."client_id")
Filter: (NOT "deleted")
Rows Removed by Filter: 1068
Total runtime: 4075.753 ms

Why a search for "client_id" is so slow??

I would think because of the NOT "deleted" clause. Which is interesting, because that's a column which you conveniently didn't include in the definition below.

Table customers:
Table "public.customers"
Column | Type | Modifiers
------------------------+-----------------------------+-----------------------------------------------------------------
id | bigint | not null default "nextval"('"customers_seq"'::"regclass")
clientid | bigint | not null default 0
name_first | character varying(80) | default ''::character varying
name_last | character varying(80) | default ''::character varying
company | character varying(255) | default ''::character varying

Index clientid_customers:
CREATE INDEX
clientid_customers
ON
customers
(
"clientid"
);

Thanks!
Patrick

Re: [GENERAL] Slow index scan - Pgsql 9.2

From

"David G. Johnston"

Date:

10 January 2017, 05:34:36

On Mon, Jan 9, 2017 at 6:06 PM, Patrick B <patrickbakerbr@gmail.com> wrote:

Explain Analyze:
CTE Scan on "query_p" "qp" (cost=0.01..1060.57 rows=1 width=8) (actual time=4065.244..4065.246 rows=1 loops=1)
CTE query_p
-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.002..0.003 rows=1 loops=1)
SubPlan 2
-> Aggregate (cost=1060.53..1060.54 rows=1 width=0) (actual time=4065.229..4065.229 rows=1 loops=1)
-> Index Scan using "clientid_customers" on "customers" "c" (cost=0.00..1059.01 rows=607 width=0) (actual time=9.105..4063.728 rows=2513 loops=1)
Index Cond: ("clientid" = "qp"."client_id")
Filter: (NOT "deleted")
Rows Removed by Filter: 1068
Total runtime: 4075.753 ms

Why a search for "client_id" is so slow??

3,581 individual pokes into the heap to confirm tuple visibility and apply the deleted filter - that could indeed take a while.

David J.

Re: [GENERAL] Slow index scan - Pgsql 9.2

From

David Rowley

Date:

10 January 2017, 05:34:36

On 10 January 2017 at 14:06, Patrick B <patrickbakerbr@gmail.com> wrote:
>           ->  Index Scan using "clientid_customers" on "customers" "c"  (cost=0.00..1059.01 rows=607 width=0) (actual
time=9.105..4063.728rows=2513 loops=1) 
>                 Index Cond: ("clientid" = "qp"."client_id")
>                 Filter: (NOT "deleted")
>                 Rows Removed by Filter: 1068
> Total runtime: 4075.753 ms
>
> Why a search for "client_id" is so slow??

EXPLAIN (ANALYZE, BUFFERS) might reveal something.

Perhaps each of the 2513 found rows, plus the 1068 filtered out rows
were spread over the table. Perhaps each on their own heap page, and
all those pages had to be read from disk. The BUFFERS option might
help show if this is the case.

Does it execute as slowly when you run it for a 2nd time?

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [GENERAL] Slow index scan - Pgsql 9.2

From

Patrick B

Date:

10 January 2017, 06:05:16

3,581 individual pokes into the heap to confirm tuple visibility and apply the deleted filter - that could indeed take a while.
David J.

I see.. The deleted column is:

deleted boolean

Should I create an index for that? How could I improve this query?

Does it execute as slowly when you run it for a 2nd time?

No, it doesn't. I think it's because of cache?

I would think because of the NOT "deleted" clause. Which is interesting, because that's a column which you conveniently didn't include in the definition below.

My mistake.

Would an Index be sufficient to solve the problem?

Patrick

Re: [GENERAL] Slow index scan - Pgsql 9.2

From

"David G. Johnston"

Date:

10 January 2017, 06:20:25

On Mon, Jan 9, 2017 at 8:05 PM, Patrick B <patrickbakerbr@gmail.com> wrote:

3,581 individual pokes into the heap to confirm tuple visibility and apply the deleted filter - that could indeed take a while.
David J.

I see.. The deleted column is:

deleted boolean

Should I create an index for that? How could I improve this query?

Does it execute as slowly when you run it for a 2nd time?

No, it doesn't. I think it's because of cache?

Correct - your first execution swallows disk I/O.

I would think because of the NOT "deleted" clause. Which is interesting, because that's a column which you conveniently didn't include in the definition below.

My mistake.

Would an Index be sufficient to solve the problem?

A lot would depend on the selectivity of "deleted"... you are tossing less than 1/3rd of the rows here so probably not that helpful - though a partial index over deleted = false would benefit this specific query.

If this query dominates your non-PK usage of this table you could consider clustering the table on client_id - that would reduce the I/O hit at the expense of increased maintenance.

Really, though, the query you are showing is poorly written - so I am assuming that it is standing in for something else. If that's so you have provided zero context for meaningful opinions to be rendered.

Queries against the supposedly large customer table, on the high-cardinality client_id field, which result in many matching rows is going to be a physical problem if the data doesn't remain in cache. CLUSTER can help a bit in that situation. Otherwise your solutions are more likely to by physical and not logical (model/SQL).

David J.

Re: [GENERAL] Slow index scan - Pgsql 9.2

From

Tomas Vondra

Date:

10 January 2017, 18:05:52

On 01/10/2017 04:05 AM, Patrick B wrote:
>     3,581 individual pokes into the heap to confirm tuple visibility
>     and apply the deleted filter - that could indeed take a while.
>     David J.
>
>
> I see.. The deleted column is:
>
> deleted boolean
>
> Should I create an index for that? How could I improve this query?
>
>
>     Does it execute as slowly when you run it for a 2nd time?
>
>
> No, it doesn't. I think it's because of cache?
>
>
>     I would think because of the NOT "deleted" clause. Which is
>     interesting, because that's a column which you conveniently didn't
>     include in the definition below.
>
>
> My mistake.
>
>
> Would an Index be sufficient to solve the problem?
>

Not a separate index - the query probably would not benefit from two
separate indexes. But you can amend the existing index, to allow
index-only scans, i.e. creating an index like this:

   CREATE INDEX ON (clientid, is_demo, deleted, id, job, job_share_mode)

This will make the index larger, but it should allow index-only scans.

The other thing you could try is partial index, i.e.

   CREATE INDEX ON (clientid) WHERE NOT is_demo AND NOT deleted;

You can also combine those approaches, but you'll have to include all
columns into the index, even those in the index predicate:

   CREATE INDEX ON (clientid, is_demo, deleted, id, job, job_share_mode)
   WHERE NOT is_demo AND NOT deleted;

I'd bet all of those will outperform the current plan.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [GENERAL] Slow index scan - Pgsql 9.2

From

Patrick B

Date:

12 January 2017, 03:05:28

2017-01-11 4:05 GMT+13:00 Tomas Vondra <tomas.vondra@2ndquadrant.com>:

On 01/10/2017 04:05 AM, Patrick B wrote:
3,581 individual pokes into the heap to confirm tuple visibility
and apply the deleted filter - that could indeed take a while.
David J.

I see.. The deleted column is:

deleted boolean

Should I create an index for that? How could I improve this query?

Does it execute as slowly when you run it for a 2nd time?

No, it doesn't. I think it's because of cache?

I would think because of the NOT "deleted" clause. Which is
interesting, because that's a column which you conveniently didn't
include in the definition below.

My mistake.

Would an Index be sufficient to solve the problem?

Not a separate index - the query probably would not benefit from two separate indexes. But you can amend the existing index, to allow index-only scans, i.e. creating an index like this:

CREATE INDEX ON (clientid, is_demo, deleted, id, job, job_share_mode)

This will make the index larger, but it should allow index-only scans.

The other thing you could try is partial index, i.e.

CREATE INDEX ON (clientid) WHERE NOT is_demo AND NOT deleted;

You can also combine those approaches, but you'll have to include all columns into the index, even those in the index predicate:

CREATE INDEX ON (clientid, is_demo, deleted, id, job, job_share_mode)
WHERE NOT is_demo AND NOT deleted;

I'd bet all of those will outperform the current plan.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Thanks for the reply!

I decided to create a partial index for that query, as it is part of a much bigger one and it is run at all the time.

Patrick