Home > mailing lists
Re: Erratically behaving query needs optimization - Mailing list pgsql-performance

From	Luís Roberto Weck
Subject	Re: Erratically behaving query needs optimization
Date	August 22, 2019 12:10:22
Msg-id	de19053e-5d69-6bd9-0d48-62475283e8a4@siscobra.com.br Whole thread Raw
In response to	Erratically behaving query needs optimization (Barbu Paul - Gheorghe <barbu.paul.gheorghe@gmail.com>)
Responses	Re: Erratically behaving query needs optimization
List	pgsql-performance
Tree view
Em 22/08/2019 08:51, Barbu Paul - Gheorghe escreveu:
> That query, if I add the ORDER BY and LIMIT, returns the same results.
>
> The problem is the fact that it behaves the same way regarding its
> speed as the original query with the index you suggested.
> Sometimes it takes 800ms, sometimes it takes 6s to run, how the hell
> can I get it to behave the same every time?
> After I added the index you suggested, it was fine for a while, next
> morning the run time exploded back to several seconds per query... and
> it oscillates.
>
> On Wed, Aug 21, 2019 at 2:25 PM Luís Roberto Weck
> <luisroberto@siscobra.com.br> wrote:
>> Em 21/08/2019 04:30, Barbu Paul - Gheorghe escreveu:
>>> I wonder how I missed that... probabily because of the "WHERE" clause
>>> in what I already had.
>>>
>>> I indexed by scheduler_task_executions.device_id and the new plan is
>>> as follows: https://explain.depesz.com/s/cQRq
>>>
>>> Can it be further improved?
>>>
>>> Limit  (cost=138511.45..138519.36 rows=2 width=54) (actual
>>> time=598.703..618.524 rows=2 loops=1)
>>>     Buffers: shared hit=242389 read=44098
>>>     ->  Unique  (cost=138511.45..138527.26 rows=4 width=54) (actual
>>> time=598.701..598.878 rows=2 loops=1)
>>>           Buffers: shared hit=242389 read=44098
>>>           ->  Sort  (cost=138511.45..138519.36 rows=3162 width=54)
>>> (actual time=598.699..598.767 rows=2052 loops=1)
>>>                 Sort Key: results.attribute_id, results."timestamp" DESC
>>>                 Sort Method: quicksort  Memory: 641kB
>>>                 Buffers: shared hit=242389 read=44098
>>>                 ->  Gather  (cost=44082.11..138327.64 rows=3162
>>> width=54) (actual time=117.548..616.456 rows=4102 loops=1)
>>>                       Workers Planned: 2
>>>                       Workers Launched: 2
>>>                       Buffers: shared hit=242389 read=44098
>>>                       ->  Nested Loop  (cost=43082.11..137011.44
>>> rows=1318 width=54) (actual time=47.436..525.664 rows=1367 loops=3)
>>>                             Buffers: shared hit=242389 read=44098
>>>                             ->  Parallel Hash Join
>>> (cost=43081.68..124545.34 rows=24085 width=4) (actual
>>> time=33.099..469.958 rows=19756 loops=3)
>>>                                   Hash Cond:
>>> (scheduler_operation_executions.task_execution_id =
>>> scheduler_task_executions.id)
>>>                                   Buffers: shared hit=32518 read=44098
>>>                                   ->  Parallel Seq Scan on
>>> scheduler_operation_executions  (cost=0.00..74945.82 rows=2482982
>>> width=8) (actual time=8.493..245.190 rows=1986887 loops=3)
>>>                                         Buffers: shared hit=6018 read=44098
>>>                                   ->  Parallel Hash
>>> (cost=42881.33..42881.33 rows=16028 width=4) (actual
>>> time=23.272..23.272 rows=13558 loops=3)
>>>                                         Buckets: 65536  Batches: 1
>>> Memory Usage: 2112kB
>>>                                         Buffers: shared hit=26416
>>>                                         ->  Parallel Bitmap Heap Scan on
>>> scheduler_task_executions  (cost=722.55..42881.33 rows=16028 width=4)
>>> (actual time=27.290..61.563 rows=40675 loops=1)
>>>                                               Recheck Cond: (device_id = 97)
>>>                                               Heap Blocks: exact=26302
>>>                                               Buffers: shared hit=26416
>>>                                               ->  Bitmap Index Scan on
>>> scheduler_task_executions_device_id_idx  (cost=0.00..712.93 rows=38467
>>> width=0) (actual time=17.087..17.087 rows=40675 loops=1)
>>>                                                     Index Cond: (device_id = 97)
>>>                                                     Buffers: shared hit=114
>>>                             ->  Index Scan using
>>> index_operation_execution_id_asc on results  (cost=0.43..0.51 rows=1
>>> width=58) (actual time=0.003..0.003 rows=0 loops=59269)
>>>                                   Index Cond: (operation_execution_id =
>>> scheduler_operation_executions.id)
>>>                                   Filter: ((data IS NOT NULL) AND (data
>>> <> '<NullData/>'::text) AND (attribute_id = ANY ('{4,5}'::integer[]))
>>> AND (object_id = 1955) AND (data_access_result = 'SUCCESS'::text))
>>>                                   Rows Removed by Filter: 0
>>>                                   Buffers: shared hit=209871
>>> Planning Time: 2.327 ms
>>> Execution Time: 618.935 ms
>>>
>>> On Tue, Aug 20, 2019 at 5:54 PM Luís Roberto Weck
>>> <luisroberto@siscobra.com.br> wrote:
>>>> Em 20/08/2019 10:54, Barbu Paul - Gheorghe escreveu:
>>>>> Hello,
>>>>> I'm running "PostgreSQL 11.2, compiled by Visual C++ build 1914,
>>>>> 64-bit" and I have a query that runs several times per user action
>>>>> (9-10 times).
>>>>> The query takes a long time to execute, specially at first, due to
>>>>> cold caches I think, but the performance varies greatly during a run
>>>>> of the application (while applying the said action by the user several
>>>>> times).
>>>>>
>>>>> My tables are only getting bigger with time, not much DELETEs and even
>>>>> less UPDATEs as far as I can tell.
>>>>>
>>>>> Problematic query:
>>>>>
>>>>> EXPLAIN (ANALYZE,BUFFERS)
>>>>> SELECT DISTINCT ON (results.attribute_id) results.timestamp,
>>>>> results.data FROM results
>>>>>        JOIN scheduler_operation_executions ON
>>>>> scheduler_operation_executions.id = results.operation_execution_id
>>>>>        JOIN scheduler_task_executions ON scheduler_task_executions.id =
>>>>> scheduler_operation_executions.task_execution_id
>>>>> WHERE scheduler_task_executions.device_id = 97
>>>>>        AND results.data <> '<NullData/>'
>>>>>        AND results.data IS NOT NULL
>>>>>        AND results.object_id = 1955
>>>>>        AND results.attribute_id IN (4, 5) -- possibly a longer list here
>>>>>        AND results.data_access_result = 'SUCCESS'
>>>>> ORDER BY results.attribute_id, results.timestamp DESC
>>>>> LIMIT 2 -- limit by the length of the attributes list
>>>>>
>>>>> In words: I want the latest (ORDER BY results.timestamp DESC) results
>>>>> of a device (scheduler_task_executions.device_id = 97 - hence the
>>>>> joins results -> scheduler_operation_executions ->
>>>>> scheduler_task_executions)
>>>>> for a given object and attributes with some additional constraints on
>>>>> the data column. But I only want the latest attributes for which we
>>>>> have results, hence the DISTINCT ON (results.attribute_id) and LIMIT.
>>>>>
>>>>> First run: https://explain.depesz.com/s/qh4C
>>>>> Limit  (cost=157282.39..157290.29 rows=2 width=54) (actual
>>>>> time=44068.166..44086.970 rows=2 loops=1)
>>>>>      Buffers: shared hit=215928 read=85139
>>>>>      ->  Unique  (cost=157282.39..157298.20 rows=4 width=54) (actual
>>>>> time=44068.164..44069.301 rows=2 loops=1)
>>>>>            Buffers: shared hit=215928 read=85139
>>>>>            ->  Sort  (cost=157282.39..157290.29 rows=3162 width=54)
>>>>> (actual time=44068.161..44068.464 rows=2052 loops=1)
>>>>>                  Sort Key: results.attribute_id, results."timestamp" DESC
>>>>>                  Sort Method: quicksort  Memory: 641kB
>>>>>                  Buffers: shared hit=215928 read=85139
>>>>>                  ->  Gather  (cost=62853.04..157098.57 rows=3162
>>>>> width=54) (actual time=23518.745..44076.385 rows=4102 loops=1)
>>>>>                        Workers Planned: 2
>>>>>                        Workers Launched: 2
>>>>>                        Buffers: shared hit=215928 read=85139
>>>>>                        ->  Nested Loop  (cost=61853.04..155782.37
>>>>> rows=1318 width=54) (actual time=23290.514..43832.223 rows=1367
>>>>> loops=3)
>>>>>                              Buffers: shared hit=215928 read=85139
>>>>>                              ->  Parallel Hash Join
>>>>> (cost=61852.61..143316.27 rows=24085 width=4) (actual
>>>>> time=23271.275..40018.451 rows=19756 loops=3)
>>>>>                                    Hash Cond:
>>>>> (scheduler_operation_executions.task_execution_id =
>>>>> scheduler_task_executions.id)
>>>>>                                    Buffers: shared hit=6057 read=85139
>>>>>                                    ->  Parallel Seq Scan on
>>>>> scheduler_operation_executions  (cost=0.00..74945.82 rows=2482982
>>>>> width=8) (actual time=7.575..15694.435 rows=1986887 loops=3)
>>>>>                                          Buffers: shared hit=2996 read=47120
>>>>>                                    ->  Parallel Hash
>>>>> (cost=61652.25..61652.25 rows=16029 width=4) (actual
>>>>> time=23253.337..23253.337 rows=13558 loops=3)
>>>>>                                          Buckets: 65536  Batches: 1
>>>>> Memory Usage: 2144kB
>>>>>                                          Buffers: shared hit=2977 read=38019
>>>>>                                          ->  Parallel Seq Scan on
>>>>> scheduler_task_executions  (cost=0.00..61652.25 rows=16029 width=4)
>>>>> (actual time=25.939..23222.174 rows=13558 loops=3)
>>>>>                                                Filter: (device_id = 97)
>>>>>                                                Rows Removed by Filter: 1308337
>>>>>                                                Buffers: shared hit=2977 read=38019
>>>>>                              ->  Index Scan using
>>>>> index_operation_execution_id_asc on results  (cost=0.43..0.51 rows=1
>>>>> width=58) (actual time=0.191..0.191 rows=0 loops=59269)
>>>>>                                    Index Cond: (operation_execution_id =
>>>>> scheduler_operation_executions.id)
>>>>>                                    Filter: ((data IS NOT NULL) AND (data
>>>>> <> '<NullData/>'::text) AND (attribute_id = ANY ('{4,5}'::integer[]))
>>>>> AND (object_id = 1955) AND (data_access_result = 'SUCCESS'::text))
>>>>>                                    Rows Removed by Filter: 0
>>>>>                                    Buffers: shared hit=209871
>>>>> Planning Time: 29.295 ms
>>>>> Execution Time: 44087.365 ms
>>>>>
>>>>>
>>>>> Second run: https://explain.depesz.com/s/uy9f
>>>>> Limit  (cost=157282.39..157290.29 rows=2 width=54) (actual
>>>>> time=789.363..810.440 rows=2 loops=1)
>>>>>      Buffers: shared hit=216312 read=84755
>>>>>      ->  Unique  (cost=157282.39..157298.20 rows=4 width=54) (actual
>>>>> time=789.361..789.535 rows=2 loops=1)
>>>>>            Buffers: shared hit=216312 read=84755
>>>>>            ->  Sort  (cost=157282.39..157290.29 rows=3162 width=54)
>>>>> (actual time=789.361..789.418 rows=2052 loops=1)
>>>>>                  Sort Key: results.attribute_id, results."timestamp" DESC
>>>>>                  Sort Method: quicksort  Memory: 641kB
>>>>>                  Buffers: shared hit=216312 read=84755
>>>>>                  ->  Gather  (cost=62853.04..157098.57 rows=3162
>>>>> width=54) (actual time=290.356..808.454 rows=4102 loops=1)
>>>>>                        Workers Planned: 2
>>>>>                        Workers Launched: 2
>>>>>                        Buffers: shared hit=216312 read=84755
>>>>>                        ->  Nested Loop  (cost=61853.04..155782.37
>>>>> rows=1318 width=54) (actual time=238.313..735.472 rows=1367 loops=3)
>>>>>                              Buffers: shared hit=216312 read=84755
>>>>>                              ->  Parallel Hash Join
>>>>> (cost=61852.61..143316.27 rows=24085 width=4) (actual
>>>>> time=237.966..677.975 rows=19756 loops=3)
>>>>>                                    Hash Cond:
>>>>> (scheduler_operation_executions.task_execution_id =
>>>>> scheduler_task_executions.id)
>>>>>                                    Buffers: shared hit=6441 read=84755
>>>>>                                    ->  Parallel Seq Scan on
>>>>> scheduler_operation_executions  (cost=0.00..74945.82 rows=2482982
>>>>> width=8) (actual time=0.117..234.279 rows=1986887 loops=3)
>>>>>                                          Buffers: shared hit=3188 read=46928
>>>>>                                    ->  Parallel Hash
>>>>> (cost=61652.25..61652.25 rows=16029 width=4) (actual
>>>>> time=236.631..236.631 rows=13558 loops=3)
>>>>>                                          Buckets: 65536  Batches: 1
>>>>> Memory Usage: 2144kB
>>>>>                                          Buffers: shared hit=3169 read=37827
>>>>>                                          ->  Parallel Seq Scan on
>>>>> scheduler_task_executions  (cost=0.00..61652.25 rows=16029 width=4)
>>>>> (actual time=0.132..232.758 rows=13558 loops=3)
>>>>>                                                Filter: (device_id = 97)
>>>>>                                                Rows Removed by Filter: 1308337
>>>>>                                                Buffers: shared hit=3169 read=37827
>>>>>                              ->  Index Scan using
>>>>> index_operation_execution_id_asc on results  (cost=0.43..0.51 rows=1
>>>>> width=58) (actual time=0.003..0.003 rows=0 loops=59269)
>>>>>                                    Index Cond: (operation_execution_id =
>>>>> scheduler_operation_executions.id)
>>>>>                                    Filter: ((data IS NOT NULL) AND (data
>>>>> <> '<NullData/>'::text) AND (attribute_id = ANY ('{4,5}'::integer[]))
>>>>> AND (object_id = 1955) AND (data_access_result = 'SUCCESS'::text))
>>>>>                                    Rows Removed by Filter: 0
>>>>>                                    Buffers: shared hit=209871
>>>>> Planning Time: 1.787 ms
>>>>> Execution Time: 810.634 ms
>>>>>
>>>>> You can see that the second run takes less than one second to run...
>>>>> which is 43 seconds better than the first try, just by re-running the
>>>>> query.
>>>>> Other runs take maybe 1s, 3s, still a long time.
>>>>>
>>>>> How can I improve it to be consistently fast (is it possible to get to
>>>>> several milliseconds?)?
>>>>> What I don't really understand is why the nested loop has 3 loops
>>>>> (three joined tables)?
>>>>> And why does the first index scan indicate ~60k loops? And does it
>>>>> really work? It doesn't seem to filter out any rows.
>>>>>
>>>>> Should I add an index only on (attribute_id, object_id)? And maybe
>>>>> data_access_result?
>>>>> Does it make sens to add it on a text column (results.data)?
>>>>>
>>>>> My tables:
>>>>>
https://gist.githubusercontent.com/paulbarbu/0d36271d710349d8fb6102d9a466bb54/raw/7a6946ba7c2adec5b87ed90f343f1aff37432d21/gistfile1.txt
>>>>>
>>>>> As you can see from the gist the foreign keys are indexed. Other
>>>>> indices were added to speed up other queries.
>>>>> Other relevant information (my tables have 3+ millions of rows, not
>>>>> very big I think?), additional info with regards to size also included
>>>>> below.
>>>>> This query has poor performance on two PCs (both running off of HDDs)
>>>>> so I think it has more to do with my indices and query than Postgres
>>>>> config & hardware, will post those if necessary.
>>>>>
>>>>>
>>>>> Size info:
>>>>> SELECT relname, relpages, reltuples, relallvisible, relkind, relnatts,
>>>>> relhassubclass, reloptions, pg_table_size(oid) FROM pg_class WHERE
>>>>> relname IN ('results', 'scheduler_operation_executions',
>>>>> 'scheduler_task_executions');
>>>>> -[ RECORD 1 ]--+-------------------------------
>>>>> relname        | results
>>>>> relpages       | 65922
>>>>> reltuples      | 3.17104e+06
>>>>> relallvisible  | 65922
>>>>> relkind        | r
>>>>> relnatts       | 9
>>>>> relhassubclass | f
>>>>> reloptions     |
>>>>> pg_table_size  | 588791808
>>>>> -[ RECORD 2 ]--+-------------------------------
>>>>> relname        | scheduler_operation_executions
>>>>> relpages       | 50116
>>>>> reltuples      | 5.95916e+06
>>>>> relallvisible  | 50116
>>>>> relkind        | r
>>>>> relnatts       | 8
>>>>> relhassubclass | f
>>>>> reloptions     |
>>>>> pg_table_size  | 410697728
>>>>> -[ RECORD 3 ]--+-------------------------------
>>>>> relname        | scheduler_task_executions
>>>>> relpages       | 40996
>>>>> reltuples      | 3.966e+06
>>>>> relallvisible  | 40996
>>>>> relkind        | r
>>>>> relnatts       | 12
>>>>> relhassubclass | f
>>>>> reloptions     |
>>>>> pg_table_size  | 335970304
>>>>>
>>>>> Thanks for your time!
>>>>>
>>>>> --
>>>>> Barbu Paul - Gheorghe
>>>>>
>>>> Can you create an index on scheduler_task_executions.device_id and run
>>>> it again?
>> Can you try this query, please?  Although I'm not really sure it'll give
>> you the same results.
>>
>>    SELECT DISTINCT ON (results.attribute_id)
>>           results.timestamp,
>>           results.data
>>      FROM results
>>     WHERE results.data <> '<NullData/>'
>>       AND results.data IS NOT NULL
>>       AND results.object_id = 1955
>>       AND results.attribute_id IN (4, 5) -- possibly a longer list here
>>       AND results.data_access_result = 'SUCCESS'
>>       AND EXISTS (SELECT 1
>>                     FROM scheduler_operation_executions
>>                          JOIN scheduler_task_executions ON
>> scheduler_task_executions.id =
>> scheduler_operation_executions.task_execution_id
>>                    WHERE scheduler_operation_executions.id =
>> results.operation_execution_id
>>                      AND scheduler_task_executions.device_id = 97)
>>     ORDER BY results.attribute_id, results.timestamp DESC
>>     LIMIT 2 -- limit by the length of the attributes list
Can you provide the EXPLAIN ANALYZE plan for the query I sent you?
pgsql-performance by date:
From: Barbu Paul - Gheorghe
Date: 22 August 2019, 11:51:50
Subject: Re: Erratically behaving query needs optimization
From: Barbu Paul - Gheorghe
Date: 22 August 2019, 12:22:01
Subject: Re: Erratically behaving query needs optimization
Re: Erratically behaving query needs optimization - Mailing list pgsql-performance

Previous

Next