Re: Erratically behaving query needs optimization - Mailing list pgsql-performance
From | Luís Roberto Weck |
---|---|
Subject | Re: Erratically behaving query needs optimization |
Date | |
Msg-id | de19053e-5d69-6bd9-0d48-62475283e8a4@siscobra.com.br Whole thread Raw |
In response to | Erratically behaving query needs optimization (Barbu Paul - Gheorghe <barbu.paul.gheorghe@gmail.com>) |
Responses |
Re: Erratically behaving query needs optimization
|
List | pgsql-performance |
Em 22/08/2019 08:51, Barbu Paul - Gheorghe escreveu: > That query, if I add the ORDER BY and LIMIT, returns the same results. > > The problem is the fact that it behaves the same way regarding its > speed as the original query with the index you suggested. > Sometimes it takes 800ms, sometimes it takes 6s to run, how the hell > can I get it to behave the same every time? > After I added the index you suggested, it was fine for a while, next > morning the run time exploded back to several seconds per query... and > it oscillates. > > On Wed, Aug 21, 2019 at 2:25 PM Luís Roberto Weck > <luisroberto@siscobra.com.br> wrote: >> Em 21/08/2019 04:30, Barbu Paul - Gheorghe escreveu: >>> I wonder how I missed that... probabily because of the "WHERE" clause >>> in what I already had. >>> >>> I indexed by scheduler_task_executions.device_id and the new plan is >>> as follows: https://explain.depesz.com/s/cQRq >>> >>> Can it be further improved? >>> >>> Limit (cost=138511.45..138519.36 rows=2 width=54) (actual >>> time=598.703..618.524 rows=2 loops=1) >>> Buffers: shared hit=242389 read=44098 >>> -> Unique (cost=138511.45..138527.26 rows=4 width=54) (actual >>> time=598.701..598.878 rows=2 loops=1) >>> Buffers: shared hit=242389 read=44098 >>> -> Sort (cost=138511.45..138519.36 rows=3162 width=54) >>> (actual time=598.699..598.767 rows=2052 loops=1) >>> Sort Key: results.attribute_id, results."timestamp" DESC >>> Sort Method: quicksort Memory: 641kB >>> Buffers: shared hit=242389 read=44098 >>> -> Gather (cost=44082.11..138327.64 rows=3162 >>> width=54) (actual time=117.548..616.456 rows=4102 loops=1) >>> Workers Planned: 2 >>> Workers Launched: 2 >>> Buffers: shared hit=242389 read=44098 >>> -> Nested Loop (cost=43082.11..137011.44 >>> rows=1318 width=54) (actual time=47.436..525.664 rows=1367 loops=3) >>> Buffers: shared hit=242389 read=44098 >>> -> Parallel Hash Join >>> (cost=43081.68..124545.34 rows=24085 width=4) (actual >>> time=33.099..469.958 rows=19756 loops=3) >>> Hash Cond: >>> (scheduler_operation_executions.task_execution_id = >>> scheduler_task_executions.id) >>> Buffers: shared hit=32518 read=44098 >>> -> Parallel Seq Scan on >>> scheduler_operation_executions (cost=0.00..74945.82 rows=2482982 >>> width=8) (actual time=8.493..245.190 rows=1986887 loops=3) >>> Buffers: shared hit=6018 read=44098 >>> -> Parallel Hash >>> (cost=42881.33..42881.33 rows=16028 width=4) (actual >>> time=23.272..23.272 rows=13558 loops=3) >>> Buckets: 65536 Batches: 1 >>> Memory Usage: 2112kB >>> Buffers: shared hit=26416 >>> -> Parallel Bitmap Heap Scan on >>> scheduler_task_executions (cost=722.55..42881.33 rows=16028 width=4) >>> (actual time=27.290..61.563 rows=40675 loops=1) >>> Recheck Cond: (device_id = 97) >>> Heap Blocks: exact=26302 >>> Buffers: shared hit=26416 >>> -> Bitmap Index Scan on >>> scheduler_task_executions_device_id_idx (cost=0.00..712.93 rows=38467 >>> width=0) (actual time=17.087..17.087 rows=40675 loops=1) >>> Index Cond: (device_id = 97) >>> Buffers: shared hit=114 >>> -> Index Scan using >>> index_operation_execution_id_asc on results (cost=0.43..0.51 rows=1 >>> width=58) (actual time=0.003..0.003 rows=0 loops=59269) >>> Index Cond: (operation_execution_id = >>> scheduler_operation_executions.id) >>> Filter: ((data IS NOT NULL) AND (data >>> <> '<NullData/>'::text) AND (attribute_id = ANY ('{4,5}'::integer[])) >>> AND (object_id = 1955) AND (data_access_result = 'SUCCESS'::text)) >>> Rows Removed by Filter: 0 >>> Buffers: shared hit=209871 >>> Planning Time: 2.327 ms >>> Execution Time: 618.935 ms >>> >>> On Tue, Aug 20, 2019 at 5:54 PM Luís Roberto Weck >>> <luisroberto@siscobra.com.br> wrote: >>>> Em 20/08/2019 10:54, Barbu Paul - Gheorghe escreveu: >>>>> Hello, >>>>> I'm running "PostgreSQL 11.2, compiled by Visual C++ build 1914, >>>>> 64-bit" and I have a query that runs several times per user action >>>>> (9-10 times). >>>>> The query takes a long time to execute, specially at first, due to >>>>> cold caches I think, but the performance varies greatly during a run >>>>> of the application (while applying the said action by the user several >>>>> times). >>>>> >>>>> My tables are only getting bigger with time, not much DELETEs and even >>>>> less UPDATEs as far as I can tell. >>>>> >>>>> Problematic query: >>>>> >>>>> EXPLAIN (ANALYZE,BUFFERS) >>>>> SELECT DISTINCT ON (results.attribute_id) results.timestamp, >>>>> results.data FROM results >>>>> JOIN scheduler_operation_executions ON >>>>> scheduler_operation_executions.id = results.operation_execution_id >>>>> JOIN scheduler_task_executions ON scheduler_task_executions.id = >>>>> scheduler_operation_executions.task_execution_id >>>>> WHERE scheduler_task_executions.device_id = 97 >>>>> AND results.data <> '<NullData/>' >>>>> AND results.data IS NOT NULL >>>>> AND results.object_id = 1955 >>>>> AND results.attribute_id IN (4, 5) -- possibly a longer list here >>>>> AND results.data_access_result = 'SUCCESS' >>>>> ORDER BY results.attribute_id, results.timestamp DESC >>>>> LIMIT 2 -- limit by the length of the attributes list >>>>> >>>>> In words: I want the latest (ORDER BY results.timestamp DESC) results >>>>> of a device (scheduler_task_executions.device_id = 97 - hence the >>>>> joins results -> scheduler_operation_executions -> >>>>> scheduler_task_executions) >>>>> for a given object and attributes with some additional constraints on >>>>> the data column. But I only want the latest attributes for which we >>>>> have results, hence the DISTINCT ON (results.attribute_id) and LIMIT. >>>>> >>>>> First run: https://explain.depesz.com/s/qh4C >>>>> Limit (cost=157282.39..157290.29 rows=2 width=54) (actual >>>>> time=44068.166..44086.970 rows=2 loops=1) >>>>> Buffers: shared hit=215928 read=85139 >>>>> -> Unique (cost=157282.39..157298.20 rows=4 width=54) (actual >>>>> time=44068.164..44069.301 rows=2 loops=1) >>>>> Buffers: shared hit=215928 read=85139 >>>>> -> Sort (cost=157282.39..157290.29 rows=3162 width=54) >>>>> (actual time=44068.161..44068.464 rows=2052 loops=1) >>>>> Sort Key: results.attribute_id, results."timestamp" DESC >>>>> Sort Method: quicksort Memory: 641kB >>>>> Buffers: shared hit=215928 read=85139 >>>>> -> Gather (cost=62853.04..157098.57 rows=3162 >>>>> width=54) (actual time=23518.745..44076.385 rows=4102 loops=1) >>>>> Workers Planned: 2 >>>>> Workers Launched: 2 >>>>> Buffers: shared hit=215928 read=85139 >>>>> -> Nested Loop (cost=61853.04..155782.37 >>>>> rows=1318 width=54) (actual time=23290.514..43832.223 rows=1367 >>>>> loops=3) >>>>> Buffers: shared hit=215928 read=85139 >>>>> -> Parallel Hash Join >>>>> (cost=61852.61..143316.27 rows=24085 width=4) (actual >>>>> time=23271.275..40018.451 rows=19756 loops=3) >>>>> Hash Cond: >>>>> (scheduler_operation_executions.task_execution_id = >>>>> scheduler_task_executions.id) >>>>> Buffers: shared hit=6057 read=85139 >>>>> -> Parallel Seq Scan on >>>>> scheduler_operation_executions (cost=0.00..74945.82 rows=2482982 >>>>> width=8) (actual time=7.575..15694.435 rows=1986887 loops=3) >>>>> Buffers: shared hit=2996 read=47120 >>>>> -> Parallel Hash >>>>> (cost=61652.25..61652.25 rows=16029 width=4) (actual >>>>> time=23253.337..23253.337 rows=13558 loops=3) >>>>> Buckets: 65536 Batches: 1 >>>>> Memory Usage: 2144kB >>>>> Buffers: shared hit=2977 read=38019 >>>>> -> Parallel Seq Scan on >>>>> scheduler_task_executions (cost=0.00..61652.25 rows=16029 width=4) >>>>> (actual time=25.939..23222.174 rows=13558 loops=3) >>>>> Filter: (device_id = 97) >>>>> Rows Removed by Filter: 1308337 >>>>> Buffers: shared hit=2977 read=38019 >>>>> -> Index Scan using >>>>> index_operation_execution_id_asc on results (cost=0.43..0.51 rows=1 >>>>> width=58) (actual time=0.191..0.191 rows=0 loops=59269) >>>>> Index Cond: (operation_execution_id = >>>>> scheduler_operation_executions.id) >>>>> Filter: ((data IS NOT NULL) AND (data >>>>> <> '<NullData/>'::text) AND (attribute_id = ANY ('{4,5}'::integer[])) >>>>> AND (object_id = 1955) AND (data_access_result = 'SUCCESS'::text)) >>>>> Rows Removed by Filter: 0 >>>>> Buffers: shared hit=209871 >>>>> Planning Time: 29.295 ms >>>>> Execution Time: 44087.365 ms >>>>> >>>>> >>>>> Second run: https://explain.depesz.com/s/uy9f >>>>> Limit (cost=157282.39..157290.29 rows=2 width=54) (actual >>>>> time=789.363..810.440 rows=2 loops=1) >>>>> Buffers: shared hit=216312 read=84755 >>>>> -> Unique (cost=157282.39..157298.20 rows=4 width=54) (actual >>>>> time=789.361..789.535 rows=2 loops=1) >>>>> Buffers: shared hit=216312 read=84755 >>>>> -> Sort (cost=157282.39..157290.29 rows=3162 width=54) >>>>> (actual time=789.361..789.418 rows=2052 loops=1) >>>>> Sort Key: results.attribute_id, results."timestamp" DESC >>>>> Sort Method: quicksort Memory: 641kB >>>>> Buffers: shared hit=216312 read=84755 >>>>> -> Gather (cost=62853.04..157098.57 rows=3162 >>>>> width=54) (actual time=290.356..808.454 rows=4102 loops=1) >>>>> Workers Planned: 2 >>>>> Workers Launched: 2 >>>>> Buffers: shared hit=216312 read=84755 >>>>> -> Nested Loop (cost=61853.04..155782.37 >>>>> rows=1318 width=54) (actual time=238.313..735.472 rows=1367 loops=3) >>>>> Buffers: shared hit=216312 read=84755 >>>>> -> Parallel Hash Join >>>>> (cost=61852.61..143316.27 rows=24085 width=4) (actual >>>>> time=237.966..677.975 rows=19756 loops=3) >>>>> Hash Cond: >>>>> (scheduler_operation_executions.task_execution_id = >>>>> scheduler_task_executions.id) >>>>> Buffers: shared hit=6441 read=84755 >>>>> -> Parallel Seq Scan on >>>>> scheduler_operation_executions (cost=0.00..74945.82 rows=2482982 >>>>> width=8) (actual time=0.117..234.279 rows=1986887 loops=3) >>>>> Buffers: shared hit=3188 read=46928 >>>>> -> Parallel Hash >>>>> (cost=61652.25..61652.25 rows=16029 width=4) (actual >>>>> time=236.631..236.631 rows=13558 loops=3) >>>>> Buckets: 65536 Batches: 1 >>>>> Memory Usage: 2144kB >>>>> Buffers: shared hit=3169 read=37827 >>>>> -> Parallel Seq Scan on >>>>> scheduler_task_executions (cost=0.00..61652.25 rows=16029 width=4) >>>>> (actual time=0.132..232.758 rows=13558 loops=3) >>>>> Filter: (device_id = 97) >>>>> Rows Removed by Filter: 1308337 >>>>> Buffers: shared hit=3169 read=37827 >>>>> -> Index Scan using >>>>> index_operation_execution_id_asc on results (cost=0.43..0.51 rows=1 >>>>> width=58) (actual time=0.003..0.003 rows=0 loops=59269) >>>>> Index Cond: (operation_execution_id = >>>>> scheduler_operation_executions.id) >>>>> Filter: ((data IS NOT NULL) AND (data >>>>> <> '<NullData/>'::text) AND (attribute_id = ANY ('{4,5}'::integer[])) >>>>> AND (object_id = 1955) AND (data_access_result = 'SUCCESS'::text)) >>>>> Rows Removed by Filter: 0 >>>>> Buffers: shared hit=209871 >>>>> Planning Time: 1.787 ms >>>>> Execution Time: 810.634 ms >>>>> >>>>> You can see that the second run takes less than one second to run... >>>>> which is 43 seconds better than the first try, just by re-running the >>>>> query. >>>>> Other runs take maybe 1s, 3s, still a long time. >>>>> >>>>> How can I improve it to be consistently fast (is it possible to get to >>>>> several milliseconds?)? >>>>> What I don't really understand is why the nested loop has 3 loops >>>>> (three joined tables)? >>>>> And why does the first index scan indicate ~60k loops? And does it >>>>> really work? It doesn't seem to filter out any rows. >>>>> >>>>> Should I add an index only on (attribute_id, object_id)? And maybe >>>>> data_access_result? >>>>> Does it make sens to add it on a text column (results.data)? >>>>> >>>>> My tables: >>>>> https://gist.githubusercontent.com/paulbarbu/0d36271d710349d8fb6102d9a466bb54/raw/7a6946ba7c2adec5b87ed90f343f1aff37432d21/gistfile1.txt >>>>> >>>>> As you can see from the gist the foreign keys are indexed. Other >>>>> indices were added to speed up other queries. >>>>> Other relevant information (my tables have 3+ millions of rows, not >>>>> very big I think?), additional info with regards to size also included >>>>> below. >>>>> This query has poor performance on two PCs (both running off of HDDs) >>>>> so I think it has more to do with my indices and query than Postgres >>>>> config & hardware, will post those if necessary. >>>>> >>>>> >>>>> Size info: >>>>> SELECT relname, relpages, reltuples, relallvisible, relkind, relnatts, >>>>> relhassubclass, reloptions, pg_table_size(oid) FROM pg_class WHERE >>>>> relname IN ('results', 'scheduler_operation_executions', >>>>> 'scheduler_task_executions'); >>>>> -[ RECORD 1 ]--+------------------------------- >>>>> relname | results >>>>> relpages | 65922 >>>>> reltuples | 3.17104e+06 >>>>> relallvisible | 65922 >>>>> relkind | r >>>>> relnatts | 9 >>>>> relhassubclass | f >>>>> reloptions | >>>>> pg_table_size | 588791808 >>>>> -[ RECORD 2 ]--+------------------------------- >>>>> relname | scheduler_operation_executions >>>>> relpages | 50116 >>>>> reltuples | 5.95916e+06 >>>>> relallvisible | 50116 >>>>> relkind | r >>>>> relnatts | 8 >>>>> relhassubclass | f >>>>> reloptions | >>>>> pg_table_size | 410697728 >>>>> -[ RECORD 3 ]--+------------------------------- >>>>> relname | scheduler_task_executions >>>>> relpages | 40996 >>>>> reltuples | 3.966e+06 >>>>> relallvisible | 40996 >>>>> relkind | r >>>>> relnatts | 12 >>>>> relhassubclass | f >>>>> reloptions | >>>>> pg_table_size | 335970304 >>>>> >>>>> Thanks for your time! >>>>> >>>>> -- >>>>> Barbu Paul - Gheorghe >>>>> >>>> Can you create an index on scheduler_task_executions.device_id and run >>>> it again? >> Can you try this query, please? Although I'm not really sure it'll give >> you the same results. >> >> SELECT DISTINCT ON (results.attribute_id) >> results.timestamp, >> results.data >> FROM results >> WHERE results.data <> '<NullData/>' >> AND results.data IS NOT NULL >> AND results.object_id = 1955 >> AND results.attribute_id IN (4, 5) -- possibly a longer list here >> AND results.data_access_result = 'SUCCESS' >> AND EXISTS (SELECT 1 >> FROM scheduler_operation_executions >> JOIN scheduler_task_executions ON >> scheduler_task_executions.id = >> scheduler_operation_executions.task_execution_id >> WHERE scheduler_operation_executions.id = >> results.operation_execution_id >> AND scheduler_task_executions.device_id = 97) >> ORDER BY results.attribute_id, results.timestamp DESC >> LIMIT 2 -- limit by the length of the attributes list Can you provide the EXPLAIN ANALYZE plan for the query I sent you?
pgsql-performance by date: