Thread: Unaccounted regression from postgresql 11 in later versions
Hello
We have an application (https://dhis2.org) which has been using postgresql as a backend for the past 15 years or so. Gradually moving through pg versions 8,9,10 etc as the years went by. At the moment a large number of our implementations are using versions 13, 14 and 15. Unfortunately we have recently discovered that, despite most operations performing considerably better on later versions, there is a particular type of query that is very much slower (up to 100x) than it was on postgresql 11. We have seen this regression in 13, 14 and 15. Unfortunately I dont have stats on version 12 yet.
The query is not beautifully crafted. It is automatically generated from a custom expression language. We know that it can probably be improved, but at the moment we would really like to know if there is anything we can configure with the SQL as-is to get performance like we had back on pg11.
The example below is a typical such query. I've attached below that, links to the results of EXPLAIN (ANALYZE, BUFFERS). for pg11 and pg15 on the same physical environment loaded with the same database. I would appreciate some help trying to understand what we are seeing with the EXPLAIN output and whether there is anything to be done.
EXPLAIN ANALYZE
select
count(pi) as value,
'2022W21' as Weekly
from
analytics_enrollment_gr3uwzvzpqt as ax
where
cast(
(
select
"IEMtgZapP2s"
from
analytics_event_gr3uWZVzPQT
where
analytics_event_gr3uWZVzPQT.pi = ax.pi
and "IEMtgZapP2s" is not null
and ps = 'XO45JBGJcXJ'
order by
executiondate desc
limit
1
) as date
) < cast('2022-05-30' as date)
and cast(
(
select
"IEMtgZapP2s"
from
analytics_event_gr3uWZVzPQT
where
analytics_event_gr3uWZVzPQT.pi = ax.pi
and "IEMtgZapP2s" is not null
and ps = 'XO45JBGJcXJ'
order by
executiondate desc
limit
1
) as date
) >= cast('2022-05-23' as date)
and (uidlevel1 = 'Plmg8ikyfrK')
and (
coalesce(
(
select
"QEbYS2QOXLf"
from
analytics_event_gr3uWZVzPQT
where
analytics_event_gr3uWZVzPQT.pi = ax.pi
and "QEbYS2QOXLf" is not null
and ps = 'XO45JBGJcXJ'
order by
executiondate desc
limit
1
):: text,
''
) = 'FIN_CASE_CLASS_CONFIRM_LAB'
)
limit 1;
-- limit 200001;
select
count(pi) as value,
'2022W21' as Weekly
from
analytics_enrollment_gr3uwzvzpqt as ax
where
cast(
(
select
"IEMtgZapP2s"
from
analytics_event_gr3uWZVzPQT
where
analytics_event_gr3uWZVzPQT.pi = ax.pi
and "IEMtgZapP2s" is not null
and ps = 'XO45JBGJcXJ'
order by
executiondate desc
limit
1
) as date
) < cast('2022-05-30' as date)
and cast(
(
select
"IEMtgZapP2s"
from
analytics_event_gr3uWZVzPQT
where
analytics_event_gr3uWZVzPQT.pi = ax.pi
and "IEMtgZapP2s" is not null
and ps = 'XO45JBGJcXJ'
order by
executiondate desc
limit
1
) as date
) >= cast('2022-05-23' as date)
and (uidlevel1 = 'Plmg8ikyfrK')
and (
coalesce(
(
select
"QEbYS2QOXLf"
from
analytics_event_gr3uWZVzPQT
where
analytics_event_gr3uWZVzPQT.pi = ax.pi
and "QEbYS2QOXLf" is not null
and ps = 'XO45JBGJcXJ'
order by
executiondate desc
limit
1
):: text,
''
) = 'FIN_CASE_CLASS_CONFIRM_LAB'
)
limit 1;
-- limit 200001;
The EXPLAIN result for postgresql 11 is here: https://explain.depesz.com/s/3QfC
The same query on postgresql 15 is here: https://explain.depesz.com/s/BzpA#html
Whereas the first example takes 23s, the pg15 one takes 243s (this time difference is even more stark when you remove BUFFERS from the explain). During execution the pg15 query consumes 100% of a CPU core throughout indicating it is probably cpu bound rather than IO.
The plan selected in both cases seems to be exactly the same. But pg15 seems to make a lot of work of the final aggregation step. Anecdotally I understand that the same difference is there with pg13 and 14. The only significant factor I could think of relating to new behaviour in pg13 is the new hash_mem_multiplier configuration and it its relation to work_mem availbale for hash tables. I have attempted to turn up both hash_mem_multilier and work_mem to ridiculous values and I see no change whatsoever on pg15.
I also removed the LIMIT and tested again with no significant difference: https://explain.depesz.com/s/K9Lq
Does anyone have a theory of why pg15 should behave so differently to pg11 here? Better still, any suggestions for configuration that might make pg15 behave more like pg10. I am really dreading the prospect of stepping our many live implementations back to pg11 :-(.
Regards
Bob
Does anyone have a theory of why pg15 should behave so differently to pg11 here? Better still, any suggestions for configuration that might make pg15 behave more like pg10. I am really dreading the prospect of stepping our many live implementations back to pg11 :-(.
One major factor here appears to be JIT compilation, which is off by default in pg11, but on by default in pg12+.
You can see at the bottom of your slowest query plan that about 233s of the 240s are JIT related.
There is good info in the docs about tuning, or turning off, JIT: https://www.postgresql.org/docs/current/jit-decision.html
You can see at the bottom of your slowest query plan that about 233s of the 240s are JIT related.
There is good info in the docs about tuning, or turning off, JIT: https://www.postgresql.org/docs/current/jit-decision.html
Wow Michael you are absolutely right. Turning jit off results in a query execution about twice as fast as pg11. That is a huge relief. I will read the jit related docs and see if there is anything smarter I should be doing other than disabling jit entirely, but it works a treat for this query.
Regards
Bob
On Wed, 31 May 2023 at 11:11, Michael Christofides <michael@pgmustard.com> wrote:
Does anyone have a theory of why pg15 should behave so differently to pg11 here? Better still, any suggestions for configuration that might make pg15 behave more like pg10. I am really dreading the prospect of stepping our many live implementations back to pg11 :-(.One major factor here appears to be JIT compilation, which is off by default in pg11, but on by default in pg12+.
You can see at the bottom of your slowest query plan that about 233s of the 240s are JIT related.
There is good info in the docs about tuning, or turning off, JIT: https://www.postgresql.org/docs/current/jit-decision.html