Re: [HACKERS] Cached plans and statement generalization - Mailing list pgsql-hackers
From | Konstantin Knizhnik |
---|---|
Subject | Re: [HACKERS] Cached plans and statement generalization |
Date | |
Msg-id | f0bfcdd6-dba3-e9a8-2108-146e1f880839@postgrespro.ru Whole thread Raw |
In response to | Re: [HACKERS] Cached plans and statement generalization (Andres Freund <andres@anarazel.de>) |
Responses |
Re: [HACKERS] Cached plans and statement generalization
Re: [HACKERS] Cached plans and statement generalization Re: [HACKERS] Cached plans and statement generalization |
List | pgsql-hackers |
On 24.04.2017 21:43, Andres Freund wrote:
Well, first of all I want to share results I already get: pgbench with default parameters, scale 10 and one connection:
So autoprepare is as efficient as explicit prepare and can increase performance almost two times.
My current implementation is replacing with parameters only string literals in the query, i.e. select * from T where x='123'; -> select * from T where x=$1;
It greatly simplifies matching of parameters - it is just necessary to locate '\'' character and then correctly handle pairs of quotes.
Handling of integer and real literals is really challenged task.
One source of problems is negation: it is not so easy to correctly understand whether minus should be treated as part of literal or as operator:
(-1), (1-1), (1-1)-1
Another problem is caused by using integer literals in context where parameters can not be used, for example "order by 1".
Fully correct substitution can be done by first performing parsing the query, then transform parse tree, replacing literal nodes with parameter nodes and finally deparse tree into generalized query. postgres_fdw already contains such deparse code. It can be moved to postgres core and reused for autoprepare (and may be somewhere else).
But in this case overhead will be much higher.
I still think that query parsing time is significantly smaller than time needed for building and optimizing query execution plan.
But it should be measured if community will be interested in such approach.
There is obvious question: how I managed to get this pgbench results if currently only substitution of string literals is supported and queries constructed by pgbench don't contain string literals? I just made small patch in pgbench replaceVariable method wrapping value's representation in quotes. It has almost no impact on performance (3482 TPS vs. 3492 TPS),
but allows autoprepare to deal with pgbench queries.
I attached my patch to this mail. It is just first version of the patch (based on REL9_6_STABLE branch) just to illustrate proposed approach.
I will be glad to receive any comments and if such optimization is considered to be useful, I will continue work on this patch.
--
Hi, On 2017-04-24 11:46:02 +0300, Konstantin Knizhnik wrote:So what I am thinking now is implicit query caching. If the same query with different literal values is repeated many times, then we can try to generalize this query and replace it with prepared query with parameters.That's not actuall all that easy: - You pretty much do parse analysis to be able to do an accurate match. How much overhead is parse analysis vs. planning in your cases? - The invalidation infrastructure for this, if not tied to to fully parse-analyzed statements, is going to be hell. - Migrating to parameters can actually cause significant slowdowns, not nice if that happens implicitly.
Well, first of all I want to share results I already get: pgbench with default parameters, scale 10 and one connection:
protocol | TPS |
simple | 3492 |
extended | 2927 |
prepared | 6865 |
simple + autoprepare | 6844 |
So autoprepare is as efficient as explicit prepare and can increase performance almost two times.
My current implementation is replacing with parameters only string literals in the query, i.e. select * from T where x='123'; -> select * from T where x=$1;
It greatly simplifies matching of parameters - it is just necessary to locate '\'' character and then correctly handle pairs of quotes.
Handling of integer and real literals is really challenged task.
One source of problems is negation: it is not so easy to correctly understand whether minus should be treated as part of literal or as operator:
(-1), (1-1), (1-1)-1
Another problem is caused by using integer literals in context where parameters can not be used, for example "order by 1".
Fully correct substitution can be done by first performing parsing the query, then transform parse tree, replacing literal nodes with parameter nodes and finally deparse tree into generalized query. postgres_fdw already contains such deparse code. It can be moved to postgres core and reused for autoprepare (and may be somewhere else).
But in this case overhead will be much higher.
I still think that query parsing time is significantly smaller than time needed for building and optimizing query execution plan.
But it should be measured if community will be interested in such approach.
There is obvious question: how I managed to get this pgbench results if currently only substitution of string literals is supported and queries constructed by pgbench don't contain string literals? I just made small patch in pgbench replaceVariable method wrapping value's representation in quotes. It has almost no impact on performance (3482 TPS vs. 3492 TPS),
but allows autoprepare to deal with pgbench queries.
I attached my patch to this mail. It is just first version of the patch (based on REL9_6_STABLE branch) just to illustrate proposed approach.
I will be glad to receive any comments and if such optimization is considered to be useful, I will continue work on this patch.
--
Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
pgsql-hackers by date: