Re: [HACKERS] lseek/read/write overhead becomes visible at scale .. - Mailing list pgsql-hackers
From | Tobias Oberstein |
---|---|
Subject | Re: [HACKERS] lseek/read/write overhead becomes visible at scale .. |
Date | |
Msg-id | 422b4e6c-b7f0-90e0-6f70-389b2d50a848@gmail.com Whole thread Raw |
In response to | Re: [HACKERS] lseek/read/write overhead becomes visible at scale .. (Andres Freund <andres@anarazel.de>) |
Responses |
Re: [HACKERS] lseek/read/write overhead becomes visible at scale ..
|
List | pgsql-hackers |
Hi, Am 24.01.2017 um 18:41 schrieb Andres Freund: > Hi, > > On 2017-01-24 18:37:14 +0100, Tobias Oberstein wrote: >>> assume that it'd get more than swamped with doing actualy work, and with >>> buffering the frequently accessed stuff in memory. >>> >>> >>>> What I am trying to say is: the syscall overhead of doing lseek/read/write >>>> instead of pread/pwrite do become visible and hurt at a certain point. >>> >>> Sure - but the question is whether it's measurable when you do actual >>> work. >> >> The syscall overhead is visible in production too .. I watched PG using perf >> live, and lseeks regularily appear at the top of the list. > > Could you show such perf profiles? That'll help us. oberstet@bvr-sql18:~$ psql -U postgres -d adr psql (9.5.4) Type "help" for help. adr=# select * from svc_sqlbalancer.f_perf_syscalls(); NOTICE: starting Linux perf syscalls sampling - be patient, this can take some time .. NOTICE: sudo /usr/bin/perf stat -e "syscalls:sys_enter_*" -x ";" -a sleep 30 2>&1 pid | syscall | cnt | cnt_per_sec -----+---------------------------------------+---------+------------- | syscalls:sys_enter_lseek | 4091584| 136386 | syscalls:sys_enter_newfstat | 2054988 | 68500 | syscalls:sys_enter_read | 767990 | 25600 | syscalls:sys_enter_close | 503803 | 16793 | syscalls:sys_enter_newstat | 434080 | 14469 | syscalls:sys_enter_open | 380382 | 12679 | syscalls:sys_enter_mmap | 301491 | 10050 | syscalls:sys_enter_munmap | 182313 | 6077 | syscalls:sys_enter_getdents | 162443 | 5415 | syscalls:sys_enter_rt_sigaction | 158947 | 5298 | syscalls:sys_enter_openat | 85325 | 2844 | syscalls:sys_enter_readlink | 77439 | 2581 | syscalls:sys_enter_rt_sigprocmask | 60929 | 2031 | syscalls:sys_enter_mprotect | 58372 | 1946 | syscalls:sys_enter_futex | 49726 | 1658 | syscalls:sys_enter_access | 40845 | 1362 | syscalls:sys_enter_write | 39513 | 1317 | syscalls:sys_enter_brk | 33656 | 1122 | syscalls:sys_enter_epoll_wait | 23776 | 793 | syscalls:sys_enter_ioctl | 19764 | 659 | syscalls:sys_enter_wait4 | 17371 | 579 | syscalls:sys_enter_newlstat | 13008 | 434 | syscalls:sys_enter_exit_group | 10135 | 338 | syscalls:sys_enter_recvfrom | 8595 | 286 | syscalls:sys_enter_sendto | 8448 | 282 | syscalls:sys_enter_poll | 7200 | 240 | syscalls:sys_enter_lgetxattr | 6477 | 216 | syscalls:sys_enter_dup2 | 5790 | 193 <snip> Note: there isn't a lot of load currently (this is from production). >>> I'm much less against this change than Tom, but doing artificial syscall >>> microbenchmark seems unlikely to make a big case for using it in >> >> This isn't a syscall benchmark, but FIO. > > There's not really a difference between those, when you use fio to > benchmark seek vs pseek. Sorry, I don't understand what you are talking about. >>> postgres, where it's part of vastly more expensive operations (like >>> actually reading data afterwards, exclusive locks, ...). >> >> PG is very CPU hungry, yes. > > Indeed - working on it ;) > > >> But there are quite some system related effects >> too .. eg we've managed to get down the system load with huge pages (big >> improvement). > > Glad to hear it. With 3TB RAM, huge pages is absolutely essential (otherwise, the system bogs down in TLB etc overhead). >>> I'd welcome seeing profiles of that - I'm working quite heavily on >>> speeding up analytics workloads for pg. >> >> Here: >> >> https://github.com/oberstet/scratchbox/raw/master/cruncher/adr_stats/ADR-PostgreSQL-READ-Statistics.pdf >> >> https://github.com/oberstet/scratchbox/tree/master/cruncher/adr_stats > > Thanks, unfortunately those appear to mostly have io / cache hit ratio > related stats? Yep, this was just to proof that we are really running a DWH workload at scale;) Cheers, /Tobias > > Greetings, > > Andres Freund >
pgsql-hackers by date: