Re: Improvement of pg_stat_statement usage about buffer hit ratio - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Improvement of pg_stat_statement usage about buffer hit ratio |
Date | |
Msg-id | CAM3SWZQpgFaCdz2+1kaRfOUP_jDwAMYX4JWFJjaE6B05xp9Kgw@mail.gmail.com Whole thread Raw |
In response to | Re: Improvement of pg_stat_statement usage about buffer hit ratio (KONDO Mitsumasa <kondo.mitsumasa@lab.ntt.co.jp>) |
Responses |
Re: Improvement of pg_stat_statement usage about buffer
hit ratio
|
List | pgsql-hackers |
On Mon, Nov 18, 2013 at 6:12 PM, KONDO Mitsumasa <kondo.mitsumasa@lab.ntt.co.jp> wrote: > I confirmed that Itagaki-san and Mr Cerdic disscution. He said that raw > values be just simple. However, were his changes just simple? I cannot > understand his aesthetics sense and also you, too:-( > It's too complicated, and do you know how to tuning PG from information of > local_* and temp_*? > At least, I think that most user cannot tuning from these information, and > it might not be useful information only part of them. All of those costs are cumulative aggregates. If we didn't aggregate them, then the user couldn't possibly determine them on their own, to any approximation. That's the difference. If you think the local_* and temp_* aren't very useful, I'm inclined to agree, but it's too late to do anything about that now. > No. It's not for geek tools and people having pre-packaged solution in big > company, but also for common DBA tools. I don't think that the tool needs to be expensive. If selecting from the pg_stat_statements view every 1-3 seconds is too expensive for such a tool, we can have a discussion about being smarter, because there certainly are ways to optimize it. Regarding your min/max patch: I'm opposed to adding even more to the spinlock-protected counters struct, so that we can get an exact answer to a question where an approximate answer would very likely be good enough. And as Itagaki-san said 4 years ago, who is to say that what you've done here for buffers (or equally, what you've done in your min/max patch) is more interesting than the same thing but for another cost? The point of having what you've removed from the pg_stat_statements docs about calculating averages is that it is an example that can be generalized from. I certainly think there should be better tooling to make displaying costs over time easier, or characterizing the distribution, but unfortunately this isn't it. Something like pg_stat_statements is allowed to be approximate. That's considered an appropriate trade-off. Most obviously, today there can be hash table collisions, and some of the entries can therefore be plain wrong. Previously, I put the probability of 1 collision in the hash table at about 1% when pg_stat_statements.max is set to 10,000. So if your min/max patch was "implemented in userspace", and an outlier is lost in the noise with just one second of activity, I'm not terribly concerned about that. It's a trade-off, and if you don't think it's the correct one, well then I'm afraid that's just where you and I differ. As I've said many times, if you want to have a discussion about making aggressive snapshotting of the pg_stat_statements view more practical, I think that would be very useful. > By the way, MySQL and Oracle database which are very popular have these > statistics. I think that your argument might disturb people who wants to > migration from these database and will accelerate popularity of these > database more. I think that there should be better tooling built on top of pg_stat_statements. I don't know what Oracle does, but I'm pretty sure that MySQL has nothing like pg_stat_statements. Please correct me if I'm mistaken. >> As I said on the min/max thread, if we're not >> doing enough to help people who would like to build such a tool, we >> should discuss how we can do better. > > Could you tell me how to get min/max statistics with low cost? See my previous comments on the other thread about making pg_stat_statements only return changed entries, and only sending the query text once. > I'm not sure about detail of your patch in CF, but it seems very high cost. I think you should actually go and read the code and read my explanation of it, and refrain from making uninformed remarks like this. Whatever overhead my patch may add, the important point is that it obviously and self-evidently adds exactly zero overhead to maintaining statistics for existing entries - we only care about the query text when first creating an entry, or when viewing an entry when the view is selected from. With the reduced shared memory consumption, more entries can be created, making the cost of creating new entries (and thus whatever might have been added to that cost) matter less. Having said that, the additional cost is thought to be very low anyway. If you read my mail to the list about this, you'd know that I specifically worried about the implications of what I'd proposed for tools like your tool. That's part of the reason I keep urging you to think about making pg_stat_statements cheaper for consumer tools. There is no reason to send query texts to such tools more than once per entry, and no reason to send unchanged entries. > Repeatedly, I think that if we want to get drastic detail statistics, we > have to create another tools of statistics. Your patch will be these > statistics tools. However, pg_stat_statement sholud be just light weight. This is incomprehensible. As with the cumulative aggregate costs, how is a consumer of pg_stat_statements possibly going to get the full query text from anywhere else? It *has* to come from pg_stat_statements directly. -- Peter Geoghegan
pgsql-hackers by date: