Process wakeups when idle and power consumption - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Process wakeups when idle and power consumption |
Date | |
Msg-id | BANLkTimhHgF=xQ3yMjxCffD9DQLGyWeegg@mail.gmail.com Whole thread Raw |
Responses |
Re: Process wakeups when idle and power consumption
Re: Process wakeups when idle and power consumption |
List | pgsql-hackers |
There is a general need to have Postgres consume fewer CPU cycles and less power when idle. Until something is done about this, shared hosting providers, particularly those who want to deploy many VM instances with databases, will continue to choose MySQL out of hand. I have quantified the difference in the number of wake-ups when idle between Postgres and MySQL using Intel's powertop utility on my laptop, which runs Fedora 14. These figures are for a freshly initdb'd database from git master, and mysql-server 5.1.56 from my system's package manager. *snip* 2.7% ( 11.5) [ ] postgres 1.1% ( 4.6) [ 1663] Xorg 0.9% ( 3.7) [ 1463] wpa_supplicant 0.6% ( 2.7) [ ] [ahci] <interrupt> 0.5% ( 2.2) [ ] mysqld *snip* Postgres consistenly has 11.5 wakeups per second, while MySQL consistently has 2.2 wakeups (averaged over the 5 second period that each cycle of instrumentation lasts). If I turn on archiving, the figure for Postgres naturally increases: *snip* 1.7% ( 12.5) [ ] postgres 1.6% ( 12.0) [ 808] phy0 0.7% ( 5.4) [ 1463] wpa_supplicant 0.6% ( 4.3) [ ] [ahci] <interrupt> 0.3% ( 2.2) [ ] mysqld *snip* It increases by exactly the amount that you'd expect after looking at pgarch.c - one wakeup per second. This is because there is a loop within the main event loop for the process that is a prime example of what unix_latch.c describes as "the common pattern of using pg_usleep() or select() to wait until a signal arrives, where the signal handler sets a global variable". The loop naps for one second per iteration. Attached is the first in what I hope will become a series of patches for reducing power consumption when idle. It makes the archiver process wake far less frequently, using a latch primitive, specifically a non-shared latch. I'm not sure if I should have used a shared latch, and have SetLatch() calls replace SendPostmasterSignal(PMSIGNAL_WAKEN_ARCHIVER) calls. Would that have broken some implied notion of encapsulation? In any case, if I apply the patch and rebuild, the difference is quite apparent: ***snip*** 3.9% ( 21.8) [ 1663] Xorg 3.2% ( 17.9) [ ] [ath9k] <interrupt> 2.1% ( 11.9) [ 808] phy0 2.1% ( 11.5) [ ] postgres 1.0% ( 5.4) [ 1463] wpa_supplicant 0.4% ( 2.2) [ ] mysqld ***snip*** The difference from not running the archiver at all appears to have been completely eliminated (in fact, we still wake up every PGARCH_AUTOWAKE_INTERVAL seconds, which is 60 seconds, but that usually isn't apparent to powertop, which measures wakeups over 5 second periods). If we could gain similar decreases in idle power consumption across all Postgres ancillary processes, perhaps we'd see Postgres available as an option for shared hosting plans more frequently. When these differences are multiplied by thousands of VM instances, they really matter. Unfortunately, there doesn't seem to be a way to get powertop to display its instrumentation per-process to quickly get a detailed overview of where those wake-ups occur across all pg processes. I hope to work on reducing wakeups for PG ancillary processes in this order (order of perceived difficulty), using shared latches to eliminate "the waiting pattern" in each case: * WALWriter * BgWriter * WALReceiver * Startup process I'll need to take a look at statistics, autovacuum and Logger processes too, to see if they present more subtle opportunities for reduced idle power consumption. Do constants like PGARCH_AUTOWAKE_INTERVAL need to always be set at their current, conservative levels? Perhaps these sorts of values could be collectively controlled with a single GUC that represents a trade-off between CPU cycles used when idle against safety/reliability. On the other hand, there are GUCs that control that per process in some cases already, such as wal_writer_delay, and that suggestion could well be a bit woolly. It might be an enum value that represented various levels of concern that would default to something like 'conservative' (i.e. the current values). Thoughts? -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
Attachment
pgsql-hackers by date: