Home > mailing lists

Re: [HACKERS] lseek/read/write overhead becomes visible at scale .. - Mailing list pgsql-hackers

From	Tobias Oberstein
Subject	Re: [HACKERS] lseek/read/write overhead becomes visible at scale ..
Date	January 24, 2017 20:57:47
Msg-id	422b4e6c-b7f0-90e0-6f70-389b2d50a848@gmail.com Whole thread Raw
In response to	Re: [HACKERS] lseek/read/write overhead becomes visible at scale .. (Andres Freund <andres@anarazel.de>)
Responses	Re: [HACKERS] lseek/read/write overhead becomes visible at scale ..
List	pgsql-hackers

Tree view

Hi,

Am 24.01.2017 um 18:41 schrieb Andres Freund:
> Hi,
>
> On 2017-01-24 18:37:14 +0100, Tobias Oberstein wrote:
>>> assume that it'd get more than swamped with doing actualy work, and with
>>> buffering the frequently accessed stuff in memory.
>>>
>>>
>>>> What I am trying to say is: the syscall overhead of doing lseek/read/write
>>>> instead of pread/pwrite do become visible and hurt at a certain point.
>>>
>>> Sure - but the question is whether it's measurable when you do actual
>>> work.
>>
>> The syscall overhead is visible in production too .. I watched PG using perf
>> live, and lseeks regularily appear at the top of the list.
>
> Could you show such perf profiles? That'll help us.

oberstet@bvr-sql18:~$ psql -U postgres -d adr
psql (9.5.4)
Type "help" for help.

adr=# select * from svc_sqlbalancer.f_perf_syscalls();
NOTICE:  starting Linux perf syscalls sampling - be patient, this can 
take some time ..
NOTICE:  sudo /usr/bin/perf stat -e "syscalls:sys_enter_*"      -x ";" -a sleep 30 2>&1 pid |                syscall
           |   cnt   | cnt_per_sec
 
-----+---------------------------------------+---------+-------------     | syscalls:sys_enter_lseek              |
4091584|      136386     | syscalls:sys_enter_newfstat           | 2054988 |       68500     | syscalls:sys_enter_read
            |  767990 |       25600     | syscalls:sys_enter_close              |  503803 |       16793     |
syscalls:sys_enter_newstat           |  434080 |       14469     | syscalls:sys_enter_open               |  380382 |
  12679     | syscalls:sys_enter_mmap               |  301491 |       10050     | syscalls:sys_enter_munmap
| 182313 |        6077     | syscalls:sys_enter_getdents           |  162443 |        5415     |
syscalls:sys_enter_rt_sigaction      |  158947 |        5298     | syscalls:sys_enter_openat             |   85325 |
   2844     | syscalls:sys_enter_readlink           |   77439 |        2581     | syscalls:sys_enter_rt_sigprocmask
|  60929 |        2031     | syscalls:sys_enter_mprotect           |   58372 |        1946     |
syscalls:sys_enter_futex             |   49726 |        1658     | syscalls:sys_enter_access             |   40845 |
   1362     | syscalls:sys_enter_write              |   39513 |        1317     | syscalls:sys_enter_brk
|  33656 |        1122     | syscalls:sys_enter_epoll_wait         |   23776 |         793     |
syscalls:sys_enter_ioctl             |   19764 |         659     | syscalls:sys_enter_wait4              |   17371 |
    579     | syscalls:sys_enter_newlstat           |   13008 |         434     | syscalls:sys_enter_exit_group
|  10135 |         338     | syscalls:sys_enter_recvfrom           |    8595 |         286     |
syscalls:sys_enter_sendto            |    8448 |         282     | syscalls:sys_enter_poll               |    7200 |
    240     | syscalls:sys_enter_lgetxattr          |    6477 |         216     | syscalls:sys_enter_dup2
|   5790 |         193
 

<snip>

Note: there isn't a lot of load currently (this is from production).

>>> I'm much less against this change than Tom, but doing artificial syscall
>>> microbenchmark seems unlikely to make a big case for using it in
>>
>> This isn't a syscall benchmark, but FIO.
>
> There's not really a difference between those, when you use fio to
> benchmark seek vs pseek.

Sorry, I don't understand what you are talking about.

>>> postgres, where it's part of vastly more expensive operations (like
>>> actually reading data afterwards, exclusive locks, ...).
>>
>> PG is very CPU hungry, yes.
>
> Indeed - working on it ;)
>
>
>> But there are quite some system related effects
>> too .. eg we've managed to get down the system load with huge pages (big
>> improvement).
>
> Glad to hear it.

With 3TB RAM, huge pages is absolutely essential (otherwise, the system 
bogs down in TLB etc overhead).

>>> I'd welcome seeing profiles of that - I'm working quite heavily on
>>> speeding up analytics workloads for pg.
>>
>> Here:
>>
>> https://github.com/oberstet/scratchbox/raw/master/cruncher/adr_stats/ADR-PostgreSQL-READ-Statistics.pdf
>>
>> https://github.com/oberstet/scratchbox/tree/master/cruncher/adr_stats
>
> Thanks, unfortunately those appear to mostly have io / cache hit ratio
> related stats?

Yep, this was just to proof that we are really running a DWH workload at 
scale;)

Cheers,
/Tobias

>
> Greetings,
>
> Andres Freund
>

pgsql-hackers by date:

From: Robert Haas
Date: 24 January 2017, 20:56:36
Subject: Re: [HACKERS] Declarative partitioning - another take

From: Tom Lane
Date: 24 January 2017, 20:58:16
Subject: Re: [HACKERS] pgbench more operators & functions

Re: [HACKERS] lseek/read/write overhead becomes visible at scale .. - Mailing list pgsql-hackers

Previous

Next