Timing overhead and Linux clock sources - Mailing list pgsql-hackers
From | Greg Smith |
---|---|
Subject | Timing overhead and Linux clock sources |
Date | |
Msg-id | 4EDED63E.2040002@2ndQuadrant.com Whole thread Raw |
Responses |
Re: Timing overhead and Linux clock sources
|
List | pgsql-hackers |
Over in the "add timing of buffer I/O requests" thread I mentioned having a system where EXPLAIN ANALYZE of a modest COUNT(*) takes 10X as long as just executing the statement. Attached is a bit of SQL and a script that runs it multiple times that demonstrate the problem on systems that have it. It assumes you've installed the Dell Store 2 database. I'd expect it to work with simpler example too (just create something with 20,000 rows), but I haven't actually confirmed that yet. From what I've been able to measure so far, here's the state of things: -If you have a system with a working TSC clock source (timing data is pulled right from the CPU), timing overhead is reasonable enough that you might turn it on even for things that happen frequently, such as the buffer I/O timing patch enables. -Some systems have tsc, but it doesn't work right in multi-core setups. Newer kernel versions know this and disable it, older ones let you pick tsc anyway and bad things occur. -Using any of the other timing mechanism--hpt, acpi_pm, and jiffies--has extremely high timing overhead. This doesn't conflict with the comment we got from Martijn van Oosterhout in the other thread: "Learned the hard way while tracking clock-skew on a multicore system. The hpet may not be the fastest (that would be the cpu timer), but it's the fastest (IME) that gives guaranteed monotonic time." There's more background about TSC, HPET, and an interesting caveat about VMs I came across: http://en.wikipedia.org/wiki/Time_Stamp_Counter http://en.wikipedia.org/wiki/High_Precision_Event_Timer http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1007020 I'm starting to envision what a new documentation section discussing timer overhead would look like now, for Linux at least. And I'm guessing other people can simulate this problem even on hardware that doesn't have it in the default config, simply by manually switching to one of the slower timing methods at boot. The test case is just doing this: \timing select count(*) from customers; explain analyze select count(*) from customers; Lots of times, with some patterns to try and avoid caching effects. The system with the worst clock timing issues I've found, the one that was so obvious it kicked off this investigation, is my Thinkpad T60 laptop. The overhead is worst on the old 32-bit Ubuntu 9.04 installation I have there. Here's some info about the system, starting with the 4 clock sources it supports: $ cat /sys/devices/system/clocksource/clocksource0/available_clocksource hpet acpi_pm jiffies tsc $ cat /sys/devices/system/clocksource/clocksource0/current_clocksource hpet $ uname -a Linux meddle 2.6.28-19-generic #61-Ubuntu SMP Wed May 26 23:35:15 UTC 2010 i686 GNU/Linux $ cat /etc/lsb-release DISTRIB_ID=Ubuntu DISTRIB_RELEASE=9.04 DISTRIB_CODENAME=jaunty DISTRIB_DESCRIPTION="Ubuntu 9.04" $ psql -c "select version()" PostgreSQL 8.3.11 on i486-pc-linux-gnu, compiled by GCC gcc-4.3.real (Ubuntu 4.3.3-5ubuntu4) 4.3.3 You change the time source by adding a stanza like this to the kernel boot options: clocksource=acpi_pm Martijn said you can just cat a new clock source into the setting instead, I haven't evaluated that yet. Wanted to get my baselines done in a situation where the kernel boots using the one I wanted, which turns out to be useful for a number of reasons. Boot time is one of the things that varied a lot when the timer was switched. The hpet time source is the default on my T60 system, but there are 3 others available. Of those, it appears that only acpi_pm is a viable alternative. I gave up on the system booting after 15 minutes when using jiffies. I could boot with tsc, but everything was erratic. I could barely use the mouse for example. tsc did have a much faster clock as measured by EXPLAIN overhead though. Here are the results, two runs of the test program that I've de-interleaved so it's a block of fast (count only) runs then a block of slow (with EXPLAIN ANALYZE): Time Source:hpet Time: 14.783 ms Time: 9.021 ms Time: 9.170 ms Time: 8.634 ms Time: 5.632 ms Time: 5.520 ms Time: 15.276 ms Time: 9.477 ms Time: 9.485 ms Time: 70.720 ms Time: 67.577 ms Time: 57.959 ms Time: 74.661 ms Time: 73.557 ms Time: 74.963 ms Time: 87.898 ms Time: 82.125 ms Time: 81.443 ms Time Source:acpi_pm Time: 8.124 ms Time: 5.020 ms Time: 4.887 ms Time: 8.125 ms Time: 5.029 ms Time: 4.881 ms Time: 82.986 ms Time: 82.366 ms Time: 82.609 ms Time: 83.089 ms Time: 82.438 ms Time: 82.539 ms Time Source:tsc Time: 8.371 ms Time: 4.673 ms Time: 4.901 ms Time: 8.409 ms Time: 4.943 ms Time: 4.722 ms Time: 16.436 ms Time: 16.349 ms Time: 16.139 ms Time: 19.871 ms Time: 17.175 ms Time: 16.540 ms There's the 10:1 possible slowdown I was talking about when using the default hpet clock, while the faster (but fundamentally unstable) tsc clock does better, only around a 4:1 slowdown there. First question I considered: is it the operating system or kernel causing this, rather than the hardware? I have Debian Squeeze running a 64-bit 2.6.32 kernel installed on here too. It only has two time sources available, with hpet being the default: $ cat /sys/devices/system/clocksource/clocksource0/available_clocksource hpet acpi_pm $ uname -a Linux meddle 2.6.32-5-amd64 #1 SMP Thu Nov 3 03:41:26 UTC 2011 x86_64 GNU/Linux $ psql -c "select version()" version ------------------------------------------------------------------------------------------------------ PostgreSQL 8.4.9 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.4.real (Debian 4.4.5-8) 4.4.5, 64-bit It's not coincidence that these are the only two clock sources that actually worked in the older kernel. jiffies are old stuff that's been on the deprecation path for a while, and it's eliminating TSC with this message during boot: [ 0.168139] checking TSC synchronization [CPU#0 -> CPU#1]: [ 0.172001] Measured 617820 cycles TSC warp between CPUs, turning off TSC clock. [ 0.172001] Marking TSC unstable due to check_tsc_sync_source failed So that there can be problems with TSC on some hardware is well known enough that the newer kernel tests, then rules it out as unusable when it fails that test. Here's the same timing results on this kernel: Time Source:hpet Time: 5.701 ms Time: 3.581 ms Time: 3.560 ms Time: 5.696 ms Time: 3.587 ms Time: 3.342 ms Time: 43.492 ms Time: 43.227 ms Time: 43.319 ms Time: 43.503 ms Time: 43.332 ms Time: 43.244 ms Time Source:acpi_pm Time: 6.350 ms Time: 3.792 ms Time: 3.490 ms Time: 5.806 ms Time: 3.672 ms Time: 3.538 ms Time: 80.093 ms Time: 79.875 ms Time: 79.789 ms Time: 80.033 ms Time: 80.330 ms Time: 80.062 ms That's much better in every way on hpet. But the time ratio hasn't really improved; it's still at least 10:1. Unless there's some blacker magic in an even newer kernel, it seems nothing I can do to this laptop will make it run gettimeofday() quickly. Let's move onto something that's actually a usable server. My main desktop has an Intel i7-860. It has the same 64-bit Debian Squeeze setup as the second set of T60 results I just showed. The only thing different about the two systems is the hardware, software is as close to identical as I could make it. (That's for my own general sanity, not something special I did for this test--all my systems have that same Debian package set) But there are different clock options available: $ cat /sys/devices/system/clocksource/clocksource0/available_clocksource tsc hpet acpi_pm $ uname -a Linux grace 2.6.32-5-amd64 #1 SMP Thu Nov 3 03:41:26 UTC 2011 x86_64 GNU/Linux $ psql -c "select version()" PostgreSQL 8.4.9 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.4.real (Debian 4.4.5-8) 4.4.5, 64-bit With tsc being the default clock source here. A passing test of it looks like this: Here's some related dmesg output: [ 0.000000] Fast TSC calibration using PIT ... 0.244640] checking TSC synchronization [CPU#0 -> CPU#1]: passed. ... [ 0.891992] checking TSC synchronization [CPU#0 -> CPU#7]: passed. [ 0.911993] Brought up 8 CPUs And here's how it performs: Time Source:tsc Time: 2.915 ms Time: 1.531 ms Time: 1.500 ms Time: 2.926 ms Time: 1.518 ms Time: 1.478 ms Time: 2.945 ms Time: 2.915 ms Time: 2.882 ms Time: 2.920 ms Time: 2.862 ms Time: 2.856 ms That's more like it. Next up I plan to survey a few AMD systems and a newer laptop, but I'm out of time for this tonight. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
Attachment
pgsql-hackers by date: