Re: Performances issues with SSD volume ? - Mailing list pgsql-admin
From | Thomas SIMON |
---|---|
Subject | Re: Performances issues with SSD volume ? |
Date | |
Msg-id | 555CAB39.1090403@neteven.com Whole thread Raw |
In response to | Re: Performances issues with SSD volume ? (Glyn Astill <glynastill@yahoo.co.uk>) |
Responses |
Re: Performances issues with SSD volume ?
|
List | pgsql-admin |
Hi Glyn, I'll try to answer this points. I've made some benchs, and indeed 3.2 not helping. not helping at all. I changed to 3.14 and gap is quite big ! With pgbench RW test, 3.2 --> 4200 TPS ; 3.14 --> 6900 TPS in same conditions With pgbench RO test, 3.2 --> 37000 TPS ; 3.14 --> 95000 TPS, same conditions too. It should so be better, but when server was in production, and ever with bad kernel, performances was already quite good before they quickly decreased. So i think too I have another configuration problem. You say you're IO bound, so some output from sar / iostat / dstat and pg_stat_activity etc before and during the issue wouldbe of use. -> My server is not in production right now, so it is difficult to replay production load and have some useful metrics. The best way I've found is to replay trafic from logs with pgreplay. I hoped that the server falls back by replaying this traffic, but it never happens ... Another thing I can't understand ... Below is my dstat output when I replay this traffic (and so when server runs normally) I have unfortunately no more outputs when server's performances decreased. [17:31:12]root@db10:~$ dstat -alrmy ----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- ---load-avg--- --io/total- ------memory-usage----- ---system-- usr sys idl wai hiq siq| read writ| recv send| in out | int csw | 1m 5m 15m | read writ| used buff cach free| int csw 1 1 98 0 0 0|4870k 16M| 0 0 | 0 0 |4666 35k|4.68 1.48 0.56| 419 774 |6204M 47.1M 70.9G 175G|4666 35k 10 4 81 5 0 0| 177M 368k| 510B 9674B| 0 0 | 32k 81k|5.11 1.62 0.61|14.7k 54.0 |6232M 47.1M 71.1G 175G| 32k 81k 3 2 92 3 0 0| 81M 7568k| 224B 738B| 0 0 | 12k 28k|5.11 1.62 0.61|8687 822 |6233M 47.1M 71.1G 175G| 12k 28k 4 2 92 2 0 0| 87M 7844k|1046B 10k| 0 0 | 12k 25k|5.11 1.62 0.61|8031 837 |6235M 47.1M 71.2G 175G| 12k 25k 14 5 78 3 0 0| 155M 2972k| 556B 1084B| 0 0 | 28k 69k|5.11 1.62 0.61|12.7k 176 |6645M 47.1M 71.4G 174G| 28k 70k 19 4 73 4 0 0| 172M 3012k|1082B 9182B| 0 0 | 33k 73k|5.11 1.62 0.61|12.5k 248 |6727M 47.1M 71.5G 174G| 33k 72k 23 5 66 6 0 0| 247M 20M| 252B 800B| 0 0 | 42k 88k|5.42 1.75 0.65|20.1k 426 |6827M 47.1M 71.8G 173G| 42k 88k 16 3 78 4 0 0| 119M 1384k| 744B 7680B| 0 0 | 28k 47k|5.42 1.75 0.65|11.3k 113 |6883M 47.1M 71.9G 173G| 28k 47k 5 1 92 2 0 0| 55M 5952k|6960B 8210B| 0 0 | 16k 25k|5.42 1.75 0.65|6304 676 |6890M 47.1M 72.0G 173G| 16k 25k 5 2 91 2 0 1| 56M 568k|9593B 13k| 0 0 | 16k 26k|5.42 1.75 0.65|6383 10.0 |6892M 47.1M 72.0G 173G| 16k 26k 5 1 92 2 0 0| 51M 0 | 546B 888B| 0 0 | 14k 24k|5.42 1.75 0.65|6091 0 |6891M 47.1M 72.1G 173G| 14k 24k 21 5 71 4 0 0| 110M 1416k| 384B 6874B| 0 0 | 31k 58k|6.18 1.97 0.73|11.1k 203 |7023M 47.1M 72.2G 173G| 31k 58k 19 4 74 2 0 0| 110M 2364k| 312B 846B| 0 0 | 23k 44k|6.18 1.97 0.73|9399 221 |7055M 47.2M 72.3G 173G| 22k 44k 26 6 65 3 0 0| 110M 6040k|1770B 7068B| 0 0 | 31k 59k|6.18 1.97 0.73|10.5k 781 |7255M 47.2M 72.4G 172G| 31k 59k 23 5 69 3 0 0| 128M 2912k|1146B 1350B| 0 0 | 27k 52k|6.18 1.97 0.73|11.2k 240 |7314M 47.2M 72.5G 172G| 27k 52k Other things you asked System memory size : 256 Go SSD Model numbers and how many : 4 SSd disks ; RAID 10 ; model INTEL SSDSC2BB480G4 Raid controller : MegaRAID SAS 2208 Partition alignments and stripe sizes : see fdisk delow Kernel options : the config file is here : ftp://ftp.ovh.net/made-in-ovh/bzImage/3.14.43/config-3.14.43-xxxx-std-ipv6-64 Filesystem used and mount options : ext4, see mtab below IO Scheduler : noop [deadline] cfq for my ssd raid volume Postgresql version and configuration : 9.3.5 max_connections=1800 shared_buffers=8GB temp_buffers=32MB work_mem=100MB maintenance_work_mem=12GB bgwriter_lru_maxpages=200 effective_io_concurrency=4 wal_level=hot_standby wal_sync_method=fdatasync wal_writer_delay=2000ms commit_delay=1000 checkpoint_segments=80 checkpoint_timeout=15min checkpoint_completion_target=0.7 archive_command='rsync ....' max_wal_senders=10 wal_keep_segments=38600 vacuum_defer_cleanup_age=100 hot_standby = on max_standby_archive_delay = 5min max_standby_streaming_delay = 5min hot_standby_feedback = on random_page_cost = 1.0 effective_cache_size = 240GB log_min_error_statement = warning log_min_duration_statement = 0 log_checkpoints = on log_connections = on log_disconnections = on log_line_prefix = '%m|%u|%d|%c|' log_lock_waits = on log_statement = 'all' log_timezone = 'localtime' track_activities = on track_functions = pl track_activity_query_size = 8192 autovacuum_max_workers = 5 autovacuum_naptime = 30s autovacuum_vacuum_threshold = 40 autovacuum_analyze_threshold = 20 autovacuum_vacuum_scale_factor = 0.10 autovacuum_analyze_scale_factor = 0.10 autovacuum_vacuum_cost_delay = 5ms default_transaction_isolation = 'read committed' max_locks_per_transaction = 128 Connection pool sizing (pgpool2) num_init_children = 1790 max_pool = 1 I also add megacli parameters : Virtual Drive: 2 (Target Id: 2) Name :datassd RAID Level : Primary-1, Secondary-0, RAID Level Qualifier-0 Size : 893.25 GB Sector Size : 512 Is VD emulated : Yes Mirror Data : 893.25 GB State : Optimal Strip Size : 256 KB Number Of Drives per span:2 Span Depth : 2 Default Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU Current Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU Default Access Policy: Read/Write Current Access Policy: Read/Write Disk Cache Policy : Enabled Encryption Type : None Bad Blocks Exist: No PI type: No PI Is VD Cached: No Other outputs : fdisk -l Disk /dev/sdc: 959.1 GB, 959119884288 bytes 255 heads, 63 sectors/track, 116606 cylinders, total 1873281024 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disk identifier: 0x00000000 Disk /dev/mapper/vg_datassd-lv_datassd: 751.6 GB, 751619276800 bytes 255 heads, 63 sectors/track, 91379 cylinders, total 1468006400 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disk identifier: 0x00000000 cat /etc/mtab /dev/mapper/vg_datassd-lv_datassd /datassd ext4 rw,relatime,discard,nobarrier,data=ordered 0 0 (I added nobarrier option) cat /sys/block/sdc/queue/scheduler noop [deadline] cfq sysctl kernel | grep sched kernel.sched_child_runs_first = 0 kernel.sched_rr_timeslice_ms = 25 kernel.sched_rt_period_us = 1000000 kernel.sched_rt_runtime_us = 950000 I've read some advices about changing kernel.sched_autogroup_enabled=0 and kernel.sched_migration_cost_ns=5000000, but this parameters are not recognized by my kernel. So I don't know what to do with that... sysctl vm vm.admin_reserve_kbytes = 8192 vm.block_dump = 0 vm.dirty_background_bytes = 8388608 vm.dirty_background_ratio = 0 vm.dirty_bytes = 67108864 vm.dirty_expire_centisecs = 3000 vm.dirty_ratio = 0 vm.dirty_writeback_centisecs = 500 vm.drop_caches = 3 vm.laptop_mode = 0 vm.legacy_va_layout = 0 vm.lowmem_reserve_ratio = 256 256 32 vm.max_map_count = 65530 vm.memory_failure_early_kill = 0 vm.memory_failure_recovery = 1 vm.min_free_kbytes = 65008 vm.min_slab_ratio = 5 vm.min_unmapped_ratio = 1 vm.mmap_min_addr = 4096 vm.nr_pdflush_threads = 0 vm.numa_zonelist_order = default vm.oom_dump_tasks = 1 vm.oom_kill_allocating_task = 0 vm.overcommit_kbytes = 0 vm.overcommit_memory = 2 vm.overcommit_ratio = 50 vm.page-cluster = 3 vm.panic_on_oom = 0 vm.percpu_pagelist_fraction = 0 vm.scan_unevictable_pages = 0 vm.stat_interval = 1 vm.swappiness = 60 vm.user_reserve_kbytes = 131072 vm.vfs_cache_pressure = 100 vm.zone_reclaim_mode = 0 select * from pg_stat_activity I've got hundred of entries for that when i'm in production, and I can't paste it here due to confidentiality. By day, it is around 50 millions queries usually. (35% selects ; 55% updates & 5% inserts) lspci | grep -E 'RAID|SCSI|IDE|SATA' 00:1f.2 SATA controller: Intel Corporation C600/X79 series chipset 6-Port SATA AHCI Controller (rev 06) 02:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 2208 [Thunderbolt] (rev 05) 07:00.0 Serial Attached SCSI controller: Intel Corporation C602 chipset 4-Port SATA Storage Control Unit (rev 06) Thanks Thomas Le 18/05/2015 16:29, Glyn Astill a écrit : >> From: Koray Eyidoğan <korayey@gmail.com> >> To: Thomas SIMON <tsimon@neteven.com> >> Cc: pgsql-admin@postgresql.org >> Sent: Monday, 18 May 2015, 14:51 >> Subject: Re: [ADMIN] Performances issues with SSD volume ? >> >> >> >> Hi Thomas, >> >> >> 3.2 kernel may be #1 cause of your I/O load problem: >> >> >> http://www.databasesoup.com/2014/09/why-you-need-to-avoid-linux-kernel-32.html >> >> https://medium.com/postgresql-talk/benchmarking-postgresql-with-different-linux-kernel-versions-on-ubuntu-lts-e61d57b70dd4 >> >> >> >> Have a nice day. >> >> >> Koray > > Likely 3.2 kernel isn't helping, but I think we need much more information before jumping to conclusions. > > You say you're IO bound, so some output from sar / iostat / dstat and pg_stat_activity etc before and during the issuewould be of use. > > Also: > System memory size > > SSD Model numbers and how many > Raid controller > Partition allignments and stripe sizes > Kernel options > Filesystem used and mount options > IO Scheduler > Postgresql version and configuration > Connection pool sizing > > > Perhaps you could thow us the output of some of these: > > fdisk -l > cat /etc/mtab > cat /sys/block/<ssd device>/queue/scheduler > sysctl kernel | grep sched > sysctl vm > select * from pg_stat_activity > select name, setting from pg_settings > lspci | grep -E 'RAID|SCSI|IDE|SATA'
pgsql-admin by date: