Re: [HACKERS] lseek/read/write overhead becomes visible at scale .. - Mailing list pgsql-hackers
| From | Tobias Oberstein |
|---|---|
| Subject | Re: [HACKERS] lseek/read/write overhead becomes visible at scale .. |
| Date | |
| Msg-id | 422b4e6c-b7f0-90e0-6f70-389b2d50a848@gmail.com Whole thread Raw |
| In response to | Re: [HACKERS] lseek/read/write overhead becomes visible at scale .. (Andres Freund <andres@anarazel.de>) |
| Responses |
Re: [HACKERS] lseek/read/write overhead becomes visible at scale ..
|
| List | pgsql-hackers |
Hi,
Am 24.01.2017 um 18:41 schrieb Andres Freund:
> Hi,
>
> On 2017-01-24 18:37:14 +0100, Tobias Oberstein wrote:
>>> assume that it'd get more than swamped with doing actualy work, and with
>>> buffering the frequently accessed stuff in memory.
>>>
>>>
>>>> What I am trying to say is: the syscall overhead of doing lseek/read/write
>>>> instead of pread/pwrite do become visible and hurt at a certain point.
>>>
>>> Sure - but the question is whether it's measurable when you do actual
>>> work.
>>
>> The syscall overhead is visible in production too .. I watched PG using perf
>> live, and lseeks regularily appear at the top of the list.
>
> Could you show such perf profiles? That'll help us.
oberstet@bvr-sql18:~$ psql -U postgres -d adr
psql (9.5.4)
Type "help" for help.
adr=# select * from svc_sqlbalancer.f_perf_syscalls();
NOTICE: starting Linux perf syscalls sampling - be patient, this can
take some time ..
NOTICE: sudo /usr/bin/perf stat -e "syscalls:sys_enter_*" -x ";" -a sleep 30 2>&1 pid | syscall
| cnt | cnt_per_sec
-----+---------------------------------------+---------+------------- | syscalls:sys_enter_lseek |
4091584| 136386 | syscalls:sys_enter_newfstat | 2054988 | 68500 | syscalls:sys_enter_read
| 767990 | 25600 | syscalls:sys_enter_close | 503803 | 16793 |
syscalls:sys_enter_newstat | 434080 | 14469 | syscalls:sys_enter_open | 380382 |
12679 | syscalls:sys_enter_mmap | 301491 | 10050 | syscalls:sys_enter_munmap
| 182313 | 6077 | syscalls:sys_enter_getdents | 162443 | 5415 |
syscalls:sys_enter_rt_sigaction | 158947 | 5298 | syscalls:sys_enter_openat | 85325 |
2844 | syscalls:sys_enter_readlink | 77439 | 2581 | syscalls:sys_enter_rt_sigprocmask
| 60929 | 2031 | syscalls:sys_enter_mprotect | 58372 | 1946 |
syscalls:sys_enter_futex | 49726 | 1658 | syscalls:sys_enter_access | 40845 |
1362 | syscalls:sys_enter_write | 39513 | 1317 | syscalls:sys_enter_brk
| 33656 | 1122 | syscalls:sys_enter_epoll_wait | 23776 | 793 |
syscalls:sys_enter_ioctl | 19764 | 659 | syscalls:sys_enter_wait4 | 17371 |
579 | syscalls:sys_enter_newlstat | 13008 | 434 | syscalls:sys_enter_exit_group
| 10135 | 338 | syscalls:sys_enter_recvfrom | 8595 | 286 |
syscalls:sys_enter_sendto | 8448 | 282 | syscalls:sys_enter_poll | 7200 |
240 | syscalls:sys_enter_lgetxattr | 6477 | 216 | syscalls:sys_enter_dup2
| 5790 | 193
<snip>
Note: there isn't a lot of load currently (this is from production).
>>> I'm much less against this change than Tom, but doing artificial syscall
>>> microbenchmark seems unlikely to make a big case for using it in
>>
>> This isn't a syscall benchmark, but FIO.
>
> There's not really a difference between those, when you use fio to
> benchmark seek vs pseek.
Sorry, I don't understand what you are talking about.
>>> postgres, where it's part of vastly more expensive operations (like
>>> actually reading data afterwards, exclusive locks, ...).
>>
>> PG is very CPU hungry, yes.
>
> Indeed - working on it ;)
>
>
>> But there are quite some system related effects
>> too .. eg we've managed to get down the system load with huge pages (big
>> improvement).
>
> Glad to hear it.
With 3TB RAM, huge pages is absolutely essential (otherwise, the system
bogs down in TLB etc overhead).
>>> I'd welcome seeing profiles of that - I'm working quite heavily on
>>> speeding up analytics workloads for pg.
>>
>> Here:
>>
>> https://github.com/oberstet/scratchbox/raw/master/cruncher/adr_stats/ADR-PostgreSQL-READ-Statistics.pdf
>>
>> https://github.com/oberstet/scratchbox/tree/master/cruncher/adr_stats
>
> Thanks, unfortunately those appear to mostly have io / cache hit ratio
> related stats?
Yep, this was just to proof that we are really running a DWH workload at
scale;)
Cheers,
/Tobias
>
> Greetings,
>
> Andres Freund
>
pgsql-hackers by date: