Thread: How to improve sql query to achieve the better plan

How to improve sql query to achieve the better plan

From

Arup Rakshit

Date:

30 September 2018, 19:22:54

I have the below query which is taking 1873 ms. How can I improve this?

explain analyze select

sum(coalesce(price_cents, 0)::bigint * coalesce(quantity, 0) * (1 - coalesce(workitems.discount, 0)/ 100)) as total_budget_cents,

sum(coalesce(price_cents, 0)::bigint * coalesce(quantity, 0) * (1 - coalesce(workitems.discount, 0)/ 100) + coalesce(additional_cost_cents, 0) - coalesce(cost_reduction_cents, 0)) as final_budget_cents,

projects.id as project_id

from

projects

left join workitems on

workitems.project_id = projects.id

where

workitems.deleted_at is null

group by

projects.id

order by

project_id asc

And explain output is:

Sort (cost=62851.33..62856.07 rows=1897 width=35) (actual time=1872.867..1873.003 rows=1229 loops=1)

Sort Key: projects.id

Sort Method: quicksort Memory: 145kB

-> HashAggregate (cost=62719.59..62748.04 rows=1897 width=35) (actual time=1871.281..1872.104 rows=1229 loops=1)

Group Key: projects.id

-> Hash Right Join (cost=159.68..45386.32 rows=364911 width=35) (actual time=2.226..637.936 rows=365784 loops=1)

Hash Cond: (workitems.project_id = projects.id)

Filter: (workitems.deleted_at IS NULL)

Rows Removed by Filter: 257457

-> Seq Scan on workitems (cost=0.00..36655.53 rows=623353 width=43) (actual time=0.020..220.215 rows=623175 loops=1)

-> Hash (cost=135.97..135.97 rows=1897 width=16) (actual time=2.177..2.177 rows=1897 loops=1)

Buckets: 2048 Batches: 1 Memory Usage: 105kB

-> Seq Scan on projects (cost=0.00..135.97 rows=1897 width=16) (actual time=0.013..1.451 rows=1897 loops=1)

Planning time: 2.775 ms

Execution time: 1873.308 ms

Projects table has the index:

Indexes:

"projects_pkey" PRIMARY KEY, btree (id)

"index_projects_on_company_id" btree (company_id)

"index_projects_on_deleted_at" btree (deleted_at)

"index_projects_on_inspector_id" btree (inspector_id)

"index_projects_on_managed_offline_by_user_id" btree (managed_offline_by_user_id)

"index_projects_on_project_status_id" btree (project_status_id)

"index_projects_on_shipyard_id" btree (shipyard_id)

"index_projects_on_vessel_id" btree (vessel_id)

Workitems table has the index:

Indexes:

"workitems_pkey" PRIMARY KEY, btree (id)

"index_workitems_on_company_id" btree (company_id)

"index_workitems_on_deleted_at" btree (deleted_at)

"index_workitems_on_parent_workitem_id" btree (parent_workitem_id)

"index_workitems_on_project_id" btree (project_id)

"index_workitems_on_standard_workitem_id" btree (standard_workitem_id)

"index_workitems_on_workitem_category_id" btree (workitem_category_id)

Thanks,

Arup Rakshit

ar@zeit.io

Re: How to improve sql query to achieve the better plan

From

Pavel Stehule

Date:

30 September 2018, 19:45:22

ne 30. 9. 2018 v 18:23 odesílatel Arup Rakshit <ar@zeit.io> napsal:

I have the below query which is taking 1873 ms. How can I improve this?

explain analyze select
sum(coalesce(price_cents, 0)::bigint * coalesce(quantity, 0) * (1 - coalesce(workitems.discount, 0)/ 100)) as total_budget_cents,
sum(coalesce(price_cents, 0)::bigint * coalesce(quantity, 0) * (1 - coalesce(workitems.discount, 0)/ 100) + coalesce(additional_cost_cents, 0) - coalesce(cost_reduction_cents, 0)) as final_budget_cents,
projects.id as project_id
from
projects
left join workitems on
workitems.project_id = projects.id
where
workitems.deleted_at is null
group by
projects.id
order by
project_id asc

And explain output is:

Sort (cost=62851.33..62856.07 rows=1897 width=35) (actual time=1872.867..1873.003 rows=1229 loops=1)
Sort Key: projects.id
Sort Method: quicksort Memory: 145kB
-> HashAggregate (cost=62719.59..62748.04 rows=1897 width=35) (actual time=1871.281..1872.104 rows=1229 loops=1)
Group Key: projects.id
-> Hash Right Join (cost=159.68..45386.32 rows=364911 width=35) (actual time=2.226..637.936 rows=365784 loops=1)
Hash Cond: (workitems.project_id = projects.id)
Filter: (workitems.deleted_at IS NULL)
Rows Removed by Filter: 257457
-> Seq Scan on workitems (cost=0.00..36655.53 rows=623353 width=43) (actual time=0.020..220.215 rows=623175 loops=1)
-> Hash (cost=135.97..135.97 rows=1897 width=16) (actual time=2.177..2.177 rows=1897 loops=1)
Buckets: 2048 Batches: 1 Memory Usage: 105kB
-> Seq Scan on projects (cost=0.00..135.97 rows=1897 width=16) (actual time=0.013..1.451 rows=1897 loops=1)
Planning time: 2.775 ms
Execution time: 1873.308 ms

maybe conditional index can help

CREATE INDEX ON workitems(project_id) WHERE deleted_at is null

Regards

Pavel

Projects table has the index:

Indexes:
"projects_pkey" PRIMARY KEY, btree (id)
"index_projects_on_company_id" btree (company_id)
"index_projects_on_deleted_at" btree (deleted_at)
"index_projects_on_inspector_id" btree (inspector_id)
"index_projects_on_managed_offline_by_user_id" btree (managed_offline_by_user_id)
"index_projects_on_project_status_id" btree (project_status_id)
"index_projects_on_shipyard_id" btree (shipyard_id)
"index_projects_on_vessel_id" btree (vessel_id)

Workitems table has the index:

Indexes:
"workitems_pkey" PRIMARY KEY, btree (id)
"index_workitems_on_company_id" btree (company_id)
"index_workitems_on_deleted_at" btree (deleted_at)
"index_workitems_on_parent_workitem_id" btree (parent_workitem_id)
"index_workitems_on_project_id" btree (project_id)
"index_workitems_on_standard_workitem_id" btree (standard_workitem_id)
"index_workitems_on_workitem_category_id" btree (workitem_category_id)

Thanks,

Arup Rakshit
ar@zeit.io

Re: How to improve sql query to achieve the better plan

From

Arup Rakshit

Date:

30 September 2018, 19:49:11

I just added it as you said, but I am getting same plan.

Sort (cost=62842.16..62846.91 rows=1897 width=35) (actual time=1845.831..1845.950 rows=1229 loops=1)

Sort Key: projects.id

Sort Method: quicksort Memory: 145kB

-> HashAggregate (cost=62710.42..62738.88 rows=1897 width=35) (actual time=1844.178..1845.060 rows=1229 loops=1)

Group Key: projects.id

-> Hash Right Join (cost=159.68..45382.09 rows=364807 width=35) (actual time=1.534..618.717 rows=365784 loops=1)

Hash Cond: (workitems.project_id = projects.id)

Filter: (workitems.deleted_at IS NULL)

Rows Removed by Filter: 257457

-> Seq Scan on workitems (cost=0.00..36653.75 rows=623175 width=43) (actual time=0.047..213.842 rows=623175 loops=1)

-> Hash (cost=135.97..135.97 rows=1897 width=16) (actual time=1.478..1.478 rows=1897 loops=1)

Buckets: 2048 Batches: 1 Memory Usage: 105kB

-> Seq Scan on projects (cost=0.00..135.97 rows=1897 width=16) (actual time=0.006..0.914 rows=1897 loops=1)

Planning time: 0.498 ms

Execution time: 1846.100 ms

——————

Indexes:

    "workitems_pkey" PRIMARY KEY, btree (id)

    "index_workitems_on_company_id" btree (company_id)

    "index_workitems_on_deleted_at" btree (deleted_at)

    "index_workitems_on_parent_workitem_id" btree (parent_workitem_id)

    "index_workitems_on_project_id" btree (project_id)

    "index_workitems_on_standard_workitem_id" btree (standard_workitem_id)

    "index_workitems_on_workitem_category_id" btree (workitem_category_id)

    "patrial_index_workitems_200_1" btree (project_id) WHERE deleted_at IS NULL

Thanks,

Arup Rakshit

ar@zeit.io

On 30-Sep-2018, at 10:15 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

CREATE INDEX ON workitems(project_id) WHERE deleted_at is null

Re: How to improve sql query to achieve the better plan

From

Pavel Stehule

Date:

30 September 2018, 19:55:20

ne 30. 9. 2018 v 18:49 odesílatel Arup Rakshit <ar@zeit.io> napsal:

I just added it as you said, but I am getting same plan.

Sort (cost=62842.16..62846.91 rows=1897 width=35) (actual time=1845.831..1845.950 rows=1229 loops=1)
Sort Key: projects.id
Sort Method: quicksort Memory: 145kB
-> HashAggregate (cost=62710.42..62738.88 rows=1897 width=35) (actual time=1844.178..1845.060 rows=1229 loops=1)
Group Key: projects.id
-> Hash Right Join (cost=159.68..45382.09 rows=364807 width=35) (actual time=1.534..618.717 rows=365784 loops=1)
Hash Cond: (workitems.project_id = projects.id)
Filter: (workitems.deleted_at IS NULL)
Rows Removed by Filter: 257457
-> Seq Scan on workitems (cost=0.00..36653.75 rows=623175 width=43) (actual time=0.047..213.842 rows=623175 loops=1)
-> Hash (cost=135.97..135.97 rows=1897 width=16) (actual time=1.478..1.478 rows=1897 loops=1)
Buckets: 2048 Batches: 1 Memory Usage: 105kB
-> Seq Scan on projects (cost=0.00..135.97 rows=1897 width=16) (actual time=0.006..0.914 rows=1897 loops=1)
Planning time: 0.498 ms
Execution time: 1846.100 ms

Then there is not too much what can be done better - maybe you can try PostgreSQL 11 with paralel hash join -- it is process about 6M rows, the time about 2 sec is good

——————

Indexes:
"workitems_pkey" PRIMARY KEY, btree (id)
"index_workitems_on_company_id" btree (company_id)
"index_workitems_on_deleted_at" btree (deleted_at)
"index_workitems_on_parent_workitem_id" btree (parent_workitem_id)
"index_workitems_on_project_id" btree (project_id)
"index_workitems_on_standard_workitem_id" btree (standard_workitem_id)
"index_workitems_on_workitem_category_id" btree (workitem_category_id)
"patrial_index_workitems_200_1" btree (project_id) WHERE deleted_at IS NULL

Thanks,

Arup Rakshit
ar@zeit.io

On 30-Sep-2018, at 10:15 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

CREATE INDEX ON workitems(project_id) WHERE deleted_at is null