Re: Slowness of extended protocol - Mailing list pgsql-hackers
From | Vladimir Sitnikov |
---|---|
Subject | Re: Slowness of extended protocol |
Date | |
Msg-id | CAB=Je-GOTaqp8gyv2jdP=AB_EqmkfNMwkbWyOr2_hOHXJYGu3w@mail.gmail.com Whole thread Raw |
In response to | Re: Slowness of extended protocol (Shay Rojansky <roji@roji.org>) |
Responses |
Re: Slowness of extended protocol
|
List | pgsql-hackers |
Shay> your analogy breaks down. Of course L2 was invented to improve performance, Shay> but that doesn't mean that all caches are the same. More precisely, what I Shay> find objectionable about your approach is not any caching - it's the Shay> implicit or automatic preparation of statements. This practice isn't Shay> invisible in that a) it may cause errors that wouldn't have been there Shay> otherwise (e.g. because of DDL), Long-lived named server-prepared statements cause problems even if server-prepared statements are created manually by developers. Could you please stop saying "automatic preparation causes ~DDL issues"? Those errors are not inherent to "automatic preparation of statements" Those are just database defects that need to be cured. Automatic savepointing is just a workaround for current DB limitation, and it provides users with a simplified migration path. Please, don't try to tell me that "IDbCommand.Prepare()" documentation says that "prepared statement might fail for no reason just because it is prepared". Shay> As I said above, I think this is a critical point of misunderstand between Shay> us. The developers tells the driver which statements should be Shay> server-prepared by calling .prepareStatement(). I'm guessing you have a Shay> totally different understanding here. Please, quote the document you got that "developers tell the driver which statements should be server-prepared by calling ..." from. It never works like that. Neither in Java, nor in C#. I would admit I've no C# experience, but I did find documentation on IDbCommand.Prepare() and examined it. The proper way to say is "by calling .Prepare() developer passes the intention that he might be using the same query multiple times". That is it. It never means "driver must absolutely use server-prepare in the response to .Prepare() call". The same goes for Java's PreparedStatement. It never means "the driver must use server-prepared features". As Microsoft lists in the .Prepare() documentation, modern versions of MSSQL just ignore .Prepare() and cache statements automatically. It is not a developer's business which statements should be in the database cache. Neither developer should care which statements reside in the driver cache. Shay> What exactly does "server-prepare on each execution" means? Sending Parse Shay> on each execution? How can that be considered prepared at all? Remember, java.sql.PreparedStatement interface is NOT bound to PostgreSQL in any manner. It is a generic database API. Thus the word "prepared" does not mean anything specific there. It gives no promise whether the statement will use "parseOnce, execMany" PostgreSQL's feature or not. A compliant implementation (that is a driver) could just assemble full SQL by concatenating the parameters on each execution and send it via 'Q' simple execute message. Shay> Does pgjdbc consider Shay> something "prepared" without it being the 2nd option above? Note that I'm Shay> genuinely interested in case I'm missing something. Currently the first 5 executions of PreparedStatement use unnamed statements (Parse/Bind/Exec). Then pgjdbc assigns a name and uses just Bind/Exec. So if a particular SQL is rare, then it would not get its own server-prepared name even though it is "java.sql.PreparedStatement". What pgjdbc does is it picks the most used queries and enables them to be cached at the database level. Vladimir>> app/component and assign variables to CPU registers. Vladimir>> This is exactly "programmer knowledge" which the compiler doesn't have. Vladimir>> Does it sound good to you? Shay> Of course not. But I don't think it's a very valid analogy. The analogy was not supposed to play in a way you twisted it with ORM=Java, driver=C, etc. Here's more detailed explanation: 1) You claim that programmers should manually examine all the SQL statements, and put ".prepare()" call if and only if the specific SQL should be server-prepared. 2) My analogy: programmers should manually examine all the variables (think of C# variables, or Java variables, or C variables, it does not matter), and assign which variables should use CPU registers, and which ones should go into the memory. Named server-prepared statements == CPU registers SQL statements in the code == variables in the code (e.g. C# variables) That is very valid analogy. What you say is "programmer has full visibility over the meaning of the code, thus it knows better which statements should be server-prepared and which should not". Well, register allocation is a bit harder problem that "statement name allocation", but the essence is the same: there's limited number of registers/named statements, so someone (or something) should decide which statements deserve a name. Just in case: you definitely know what CPU registers are and what is "register allocation" problem, don't you? You seem to pick up that "application developer != compiler engineer", however then you slipped into "nevertheless application developer should decide if each SQL should be server-prepared on a case-by-case basis". In ideal world that might work. However, that does not work in practice: 0) Microsoft's documentation on ".Prepare()" says "prepare call does not have any effect since sql server optimizes the statements automatically". So applications ported from MSSQL might miss ".Prepare()" altogether, thus it will under-perform and blame PostgreSQL for no reason. Does it sound fair? I do not think so. Do you see lots of complaints for "MSSQL doing automatic statement caching"? I'm not into MSSQL, but I know Oracle does automatic statement caching as well, and no one ever raises questions like "my precious statement was evicted". 1) What if the database RAM is increased, so the application can ".prepare()" not 20, but 2000 statements? Should developer add more ".prepare()" calls and recompile the application just to accommodate RAM increase? That sounds crazy. I think the way to go should be just increase "statement cache size". 2) Do you really expect each project to have a list of all the used SQL along with "should server-prepare" flags? 3) What if the application is composed of 5 components? How should one split the budget of server-prepared statements? (remember, we cannot prepare all of them otherwise the DB would crash) 4) Database driver does see all the SQLs executed within a connection. Of course, driver's automatic cache cannot beat "ideal pick", however that automatic statement can easily be "just fine" and automatic caching saves enormous amount of development time. Remember, C# compiler often is "just fine" at CPU register allocation, and you don't need to control how the code is actually executed. That spare time would better be spent on SQL tuning (that is "explain analyze...") rather than deciding which statements should be server-prepared and which should not. 5) When porting from other database, the application might already be full of "con.prepareStatement(String sql)" calls. As I said, there is NO NonPreparedStatement in Java. Does that mean an application ported from say Oracle should crash when running against PostgreSQL? That does not sound right. Shay> Whereas it's totally Shay> unreasonable to expect programmers to do the work of the compiler Shay> optimizer, it seems really reasonable to ask them to think about server Shay> preparation, and even about multivalue inserts (at least if performance is Shay> important in the application). a) They don't even know that multivalues in PostgreSQL gives such a boost b) By the time all the ORMs implement that feature, PostgreSQL might fix that defect so the optimization will become obsolete. Once again: I do not suggest rewriting SELECTs on the fly yet. What I suggest is to decide "which are the most important SQLs and which of them deserve their own server-prepared name" at the driver level. Remember: it is very close to what compiler is doing. It assigns registers automatically based on some heuristics. For typical applications it is just fine. No-one is trying to Shay> and b) it imposes a resource drain on the Shay> server. The second point is very important: the L2 cache doesn't impose a Shay> resource drain on anyone - it's just there, speeding up your application. I Shay> hope that point makes it across - it's why I don't accept your analogy. Oh. You might be not into a chip design either, that is why that analogy plays so bad. Just in case: if L2 cache was just there, and if L2 cache didn't impose a resource drain, why don't we have 1GiB L2 caches? Of course L2 drains lots of resources: 1) It drains money, as L2 costs valuable on-die space 2) L2 drains power, that is naturally drains your battery Of course there are good reasons to have L2 even though it drains resources, but the statement of "it is just there for free" is plain wrong. Shay> I'm still Shay> not sure if it should be opt-in or default, although I admit I'm leaning Shay> towards default. But that feature has very little to do with *implicit* Shay> preparation. Shay> The point is that AFAIK the same bugs that can result from reordering can Shay> also result from other basic conditions as well. If you're writing Shay> multithreaded code then you must handle synchronization - this is not a Shay> reordering-specific problem. Therefore if your program is multithreaded but Shay> doesn't do proper synchronization you have a bug - regardless of whether Shay> its manifestation is triggered by CPU reordering or not. I admit I'm not an Shay> expert on this and may be wrong (it would be interesting to know). Regarding "regardless...by CPU or not", you are off. I've listed a link to a concurrency bug in Linux kernel. It manifested itself only on some recent Intel CPUs. In other words, the kernel was fine otherwise, but if running modern CPU, it might hang. Of course it was bug in the kernel, but the trigger condition was "smarter CPU" that was able to do more "reorderings" that were possible with older ones. There are good articles by Alexey Shipilev where he describes what are the typical concurrency problems, and why do compiler engineers allow them to appear. The articles are java-related, however I think they are good read anyway: https://shipilev.net/blog/2014/jmm-pragmatics https://shipilev.net/blog/2016/close-encounters-of-jmm-kind Shay> It does only if you do it in a roundtrip of its own. When you close a Shay> pooled connection in Npgsql, the reset query is written to an internal Shay> buffer but not sent. The first query that actually gets sent by the user Shay> after opening will therefore have the reset query prepended to it Shay> (basically the reset query is batched) That is clever. Is it something specific to Npgsql pool? Does it work with all poolers that operate on top of Npgsql connections? Vladimir
pgsql-hackers by date: