Re: [SPAM] Re: [SPAM] Re: Best practices to manage custom statistics - Mailing list pgsql-general
From | Adrian Klaver |
---|---|
Subject | Re: [SPAM] Re: [SPAM] Re: Best practices to manage custom statistics |
Date | |
Msg-id | 5a767cde-1cce-ac8f-2276-571f5e1ba46e@aklaver.com Whole thread Raw |
In response to | Re: [SPAM] Re: [SPAM] Re: Best practices to manage custom statistics (Moreno Andreo <moreno.andreo@evolu-s.it>) |
List | pgsql-general |
On 11/24/2016 09:59 AM, Moreno Andreo wrote: > Hi Adrian, > First of all: now I've seen that not all fields touched by WHERE >> >> Aah, I get it now. You are refreshing forms every 60 seconds over 350 >> users each with their own database. Actually worse then that as there >> is user overlap over databases(up to 10 per user), so the same >> database can be hit multiple times at a given refresh. Seems physics >> is at work here as you have already pointed out. Namely fetching all >> that data at regular intervals taxes the bandwith as well as the >> CPU/storage. > Exactly. At the moment the bottleneck is I/O (running on a VM over a > RAID-5 with 15kRPM SAS), in the next weeks I'm up to prepare a test > server with SSDs ("attached SSD" on Google Cloud Platform), that's > pretty much expensive, but data sheets point out a very good IOPS rate > (rising as size, and price, rises). CPU is not a problem (almost never > over 70-80%, average is about 40% over a 4 core server) >> >> High levels solutions that come to mind to spread the load out: >> >> 1) Replication to spread data across multiple machines. >> Or just split the databases over multiple non-replicated Postgres >> instances on separate machines > Already in place, but only for owner's database. Other databases can't > be "downloaded" essentially for privacy matters. Not sure I follow, a user can see their database and up to 9 other users databases. Not seeing how replication would be any less 'private' then that, especially if the databases are replicated to machines the company owns. >> >> 2) Caching results from each individual database so subsequent calls >> for the information do not touch the database. You already touched on >> this with your counter table. > Yes, and I'm all ears on suggestions on what to be done and if there are > hidden caveats... The usual I would guess: Storage for the cached data. Maintaining the cache. "There are only two hard things in Computer Science: cache invalidation and naming things." -- Phil Karlton > I heard that stored procedures (and triggers, I suppose) are faster to > execute than the same "operation" coming from outside because the > backend has not to translate it, and in this case can be executed only > when needed Well that and that less data has to flow back and forth across a network. In your case you said bandwidth is less of on an issue then disk I/0 on the server. In-database procedures are still going to require I/O on the server. >> >> 3) Spreading out the refresh interval. Not sure if the 60 second >> interval is synced across users. Still maybe setting different refresh >> intervals and/or changing it on the fly when load increases. A back >> off equation so to speak. > I'm afraid that if we have a moment when load is extremely high (I > experienced top measuring 45 (5-minutes basis) on a 4-core machine, it > was even hard to type in putty console!!), we won't have any upgrade. Is there a predictability(even a rough one) about when the load spikes occur? Something you could use to spread the load out. > Yes, in these conditions we're still not having upgrades, but my goal is > to find a way to reduce overall query load to have a good (V-)hardware > design so I can have better performance with lower cost. > "Pushing" data would be the best way... but I need to have a look to > LISTEN/NOTIFY, and what it means with Npgsql and JDBC (one thing I > didn't specify is that between user and database we have a web server, > in LAN with PgSQL server, hosting web services for the user to interact ? http://www.memcached.org/ http://redis.io/ > with database. Maybe web server can "proxy" connections (keep them open) > and the listen/notify method can be achieved (yes, it's a bit of hard > work to rewrite all connection policies, but if it's necessary...) > > I think that a combination of the last two should be the best. > Tell me what you think..... I think there are a lot of moving parts to this and more pop up with each post:) I would not even know how to start to compose a useful answer to what is a complex problem that also looks to be in the process of fairly major hardware changes. All I can suggest is that you create a test setup and start doing some incremental changes, using some of the suggestions already provided, with tests to measure whether the changes actually are a benefit. > >> >>>> >>>> Except when they are wrong:) Still been there. >>> Don't tell me.... :-) >>>> >> >>>>>>> >>>>>>> Thanks in advance and sorry for the long message (but I had to >>>>>>> explain >>>>>>> such a complex thing) >>>>>>> Moreno.- >> >> >> > > > > -- Adrian Klaver adrian.klaver@aklaver.com
pgsql-general by date: