Re: PostgreSQL Query Speed Issues - Mailing list pgsql-novice
From | Joseph Pravato |
---|---|
Subject | Re: PostgreSQL Query Speed Issues |
Date | |
Msg-id | 5127ED08.3020408@nomagic.com Whole thread Raw |
In response to | Re: PostgreSQL Query Speed Issues (Kevin Grittner <kgrittn@ymail.com>) |
Responses |
Re: PostgreSQL Query Speed Issues
|
List | pgsql-novice |
On 2/22/2013 8:58 AM, Joseph Pravato wrote:
----- Forwarded Message ----- From: "Kevin Grittner" <kgrittn@ymail.com> To: "Tom Lisjac" <netdxr@gmail.com>, pgsql-novice@postgresql.org Sent: Friday, February 22, 2013 7:29:58 AM Subject: Re: [NOVICE] PostgreSQL Query Speed Issues Tom Lisjac <netdxr@gmail.com> wrote:version 9.2.1 running on Centos6/64There are performance problems in the 9.2 branch which are fixed in 9.2.3. If you care about performance, or for that matter running with known bugs fixed, upgrade. http://www.postgresql.org/support/versioning/
Our server manager is going to update to 9.2.3 this weekend to see if we get any improvements.
The database is running in a Xen VM with 12GB of ram, 4 virtual CPU's and fast, dedicated physical disks rather then shared network storage. The problem queries take one core to saturation and keep it there with very little disk I/O. max_connections = 200You may want to consider connection pooling: http://wiki.postgresql.org/wiki/Number_Of_Database_Connections
On our server, we are using connection pooling with a maximum of 50. For the moment, we've moved it down to max_connections = 100. Originally, the 200 was because we were using a multi-threaded migration routine to load data into Postgres. However, we've set up a second database dedicated for migration and are using pg_dump & pg_restore.
shared_buffers = 2GBThe usual advice is to start at 25% of machine RAM, up to 8GB, and check performance with incremental updates from there. That would suggest a 4GB starting point.effective_cache_size = 1024MBThis should normally be around 75% of machine RAM, so 9GB. This does not actually allocate any RAM, but is used to estimate how much of the indexes may be in RAM on repeated access. A larger number allows more index usage.
We actually only have 11.37GB on our server, so we've used 3072MB for shared_buffers & 8400MB for effective_cache_size.
In addition, with 12GB and apparently not more than 2 million rows per table, you seem very likely to have the active portion of your database fully cached. So these settings are likely to help: seq_page_cost = 0.1 random_page_cost = 0.1 cpu_tuple_cost = 0.03
Wow, the random_page_cost setting made a huge difference. The query we mentioned yesterday that takes 30 minutes is down to about 30 seconds. Nice speed improvement, reasonable speed, but still could use improvements. However, I'm a bit worried about changing it so drastically. Can there be any future issues from this change? We have 4 or 5 tables that will continue to grow fairly rapidly (1 million row increase every 2-3 years).
I normally don't set work_mem above RAM * 0.25 / max_connections, so I'd say 10MB to 15MB would probably be good with your max_connections setting. If you can lower that with a connection pool you might want to go to 20MB or 40MB.
For the moment, It is set to 30MB based on your formula and the new 100 connection limit.
You probably want to boost maintenance_work_mem to something like 512MB. If you haven't already done so, run VACUUM ANALYZE at the database level. If most of your data was loaded at about the same time, run VACUUM FREEZE ANALYZE instead, to prevent a read-and-rewrite pass of your entire database at peak OLTP load. Vacuum and analyze are routine maintenance that are necessary for the database to perform well. Autovacuum can often handle everything for you, but if you have an "off-hours" period when load is lower it is often a good idea to run a database VACUUM ANALYZE on a daily or weekly basis to shift maintenance load to time when it has the least impact. Never leave autovacuum disabled beyond a short maintenance or load window.
The vacuuming didn't seem to change any performances, and it didn't take very long to run. Autovacuum may have already taken care of most potential issues from this.
Regarding your specific problem... You claim this is how contact_address_map is defined: create table contact_address_map ( contact id int8 not null, address_id int8 not null, unique (address_id) ); ... but your query is looking for rows in that table where address_id is null, and finding them. Those can't both be true.
We are migrating a really old db to postgres and cleaning up all of the data. In the old system, only one address was allowed per user and therefore with the migrated data there is only going to be 1 entry in the map for each user. We were looking for which users didn't have an address at all and the query was a optimization with this assumption. We are aware that the query is not ideal, but it works for the just-after-migrating scenario for validating parts of the migration routine. Do note that were were left joining the two tables so that all contacts are guaranteed to return, only with a check on address_id to see if they didn't have any addresses.
Your plan shows an index scan with no index conditions to pass every row in a table, which is much slower than a seqscan. Did you turn off enable_seqscan and fail to mention that? Doing so will definitely result in sub-optimal performance on queries like the one shown, where every row in one of the tables must be read.
No, enable_seqscan is still turned on.
When you don't provide accurate information, any advice you get may be off-target.
pgsql-novice by date: