Thread: Slow query fixed by replacing equality with a nested query

Slow query fixed by replacing equality with a nested query

From

Valentin Janeiko

Date:

20 January 2022, 19:04:32

Hi everyone,

I have a SELECT query that uses a chain of CTEs (4) that is slow to run on a large
database. But if I change a where clause in one of the small CTEs from an
equality to an equivalent nested IN query, then the query becomes fast. Looking
at the query plan I can see that after the change Postgres avoids a large and
slow index scan by using a different index and aggregation. I am reluctant to
accept the accidental "fix" because it seems odd and counter intuitive. Can
anyone shed some light on what's going on? Is my fix the intended solution or
is there a better way to write this query?

We have a system which stores resource blobs and extracts search parameters
into a number of tables. The query in question tries to find all resources with
a specific tag (cte0) that are related to resource X (cte2) and are dated
before some (recent) date Y (cte1) and sort them by date (cte3 & cte4). The
query was working okay on a small database, but over time as the database grew
the query started to timeout. Which is why I am looking at it now.

I have accidentally fixed the performance by replacing `system_id = 20` with
`system_id IN (SELECT system_id FROM fhir.system WHERE value = 'REDACTED')`.
The nested query here returns a single row with a value `20`.

Here are the results of EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS, FORMAT JSON):
- Slow: https://explain.depesz.com/s/joHK
- Fast: https://explain.depesz.com/s/tgd4

Some more info about the CTEs:
- cte0: select resources with a specific tag
(most common resource types with the most common tag)
- cte1: filter resource by date no later than Y
(matches ~50% of the table, and most of resource from cte0)
- cte2: select resources that are related to a specific resource X
(matches 1-5 resources)
- cte3: adds the date as a sort value
- cte4: sorts the result

I have also created a gist:
https://gist.github.com/valeneiko/89f8cbe26db7ca2651b47524462b5d18
- Schema.sql: the SQL script to create tables and indexes
- Query.sql: the query I am trying to run
- Postgres Settings, Table sizes and Statistics are also included in the gist

PostgreSQL Version:
PostgreSQL 13.3 (Ubuntu 13.3-1.pgdg18.04+1) on x86_64-pc-linux-gnu,
compiled by gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0, 64-bit

Setup: PostgreSQL is running inside a docker container on a dedicate node in a
Kubernetes cluster (using Zalando Spilo image: https://github.com/zalando/spilo)

Thank you,
Valentinas

Re: Slow query fixed by replacing equality with a nested query

From

Michael Lewis

Date:

21 January 2022, 03:33:17

I don't see any reference to cte1. Is that expected?

I'm unclear why these sets are not just inner join'd on resource_surrogate_id. It seems like that column it is being selected as Sid1 in each CTE, and then the next one does the below. Why?

where resource_surrogate_id IN (SELECT Sid1 FROM cte_previous_number)

RE: Slow query fixed by replacing equality with a nested query

From

Date:

21 January 2022, 11:37:54

My mistake. I have updated the query in the gist: cte1 should have been referenced in cte2.

The query plans are correct. It was just the query in the gist that was incorrect (I was just verifying cte1 was the culprit – without it the query is fast too).

This SQL query is a result of translating a FHIR query into SQL. These queries are generated on the fly from user input. The chains will not always be linear. But I guess I could write an optimizer that rewrites linear parts as JOINS. If that would result in better query plans.

I have done a few simple experiments in the past comparing CTEs like this to JOINS, but the resultant query plans were the same. CTEs seemed easier to follow when troubleshooting issues, so I left them as such. Do JOINs become better than CTEs at a certain point?

I will attempt to rewrite the query with JOINs on Monday to see if it makes a difference. It might be tricky, the relationship from resource table to search parameter tables is often a 1 to many.

Re: Slow query fixed by replacing equality with a nested query

From

Valentin Janeiko

Date:

24 January 2022, 13:22:41

I have rewritten the query using JOINs. I had to make one of them a
FULL JOIN, but otherwise JOINs seem like a good idea.
I have added the new query to the (same) gist:
https://gist.github.com/valeneiko/89f8cbe26db7ca2651b47524462b5d18#file-queryoptimized-sql
The query plan is much better with just a few small index scans which
completes in under a millisecond: https://explain.depesz.com/s/vBdG

Thank you for your help. Let me know if you have any other suggestions.

Re: Slow query fixed by replacing equality with a nested query

From

Michael Lewis

Date:

24 January 2022, 17:50:40

On Fri, Jan 21, 2022 at 4:37 AM <val.janeiko@gmail.com> wrote:

I have done a few simple experiments in the past comparing CTEs like this to JOINS, but the resultant query plans were the same. CTEs seemed easier to follow when troubleshooting issues, so I left them as such. Do JOINs become better than CTEs at a certain point?

Read up on from_collapse_limit. If the query can re-write subqueries to collapse the join problem, then it will at first but then once it reaches that threshold, then it won't try anymore to avoid excessive planning time. That's when things can go awry.

Re: Slow query fixed by replacing equality with a nested query

From

Michael Lewis

Date:

24 January 2022, 17:52:40

On Mon, Jan 24, 2022 at 6:22 AM Valentin Janeiko <val.janeiko@gmail.com> wrote:

I have rewritten the query using JOINs. I had to make one of them a
FULL JOIN, but otherwise JOINs seem like a good idea.
I have added the new query to the (same) gist:
https://gist.github.com/valeneiko/89f8cbe26db7ca2651b47524462b5d18#file-queryoptimized-sql
The query plan is much better with just a few small index scans which
completes in under a millisecond: https://explain.depesz.com/s/vBdG

Glad to hear it, but as best as I can figure, that right join is actually an inner join because of the where clause meaning that cte2Source must not be null and therefore cte2.resource_surrogate_id must not be null.

RIGHT JOIN fhir.reference_search_param AS cte2 ON
cte2.is_history = false
AND cte2.search_param_id = 561
AND cte2.resource_type_id IN (42)
AND cte2.reference_resource_type_id = r.resource_type_id
AND cte2.reference_resource_id_hash = r.resource_id_hash

INNER JOIN fhir.resource AS cte2Source ON
cte2Source.is_history = false
AND cte2Source.resource_type_id IN (42)
AND cte2Source.resource_surrogate_id = cte2.resource_surrogate_id

WHERE cte1.start_date_time <= '2022-01-12 12:13:21.969000Z'
AND r.resource_type_id IN (10, 52, 95, 119, 60)
AND cte2Source.resource_id_hash IN ('df26ca5a-d2e2-1576-2507-815d8e73f15e'::uuid)