BUG #15636: PostgreSQL 11.1 pg_basebackup backup to a CIFS destination throws fsync error at end of backup - Mailing list pgsql-bugs
From | PG Bug reporting form |
---|---|
Subject | BUG #15636: PostgreSQL 11.1 pg_basebackup backup to a CIFS destination throws fsync error at end of backup |
Date | |
Msg-id | 15636-d380890dafd78fc6@postgresql.org Whole thread Raw |
Responses |
Re: BUG #15636: PostgreSQL 11.1 pg_basebackup backup to a CIFSdestination throws fsync error at end of backup
Re: BUG #15636: PostgreSQL 11.1 pg_basebackup backup to a CIFSdestination throws fsync error at end of backup |
List | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 15636 Logged by: John Klann Email address: jk7255@gmail.com PostgreSQL version: 11.1 Operating system: Red Hat Enterprise Linux Server release 7.5 Description: Issue: - PostgreSQL 11.1 pg_basebackup and pg_dump parallel database backup to a CIFS destination throws fsync error at the very end of the backup. - Command: pg_basebackup -D /cifs/backups/<backupDirectoryName> -U backupuser -Ft -Z 1 -X fetch -p 5432 - Error: pg_basebackup: could not fsync file "/cifs/backups/<backupDirectoryName>": Invalid argument Details: We are preparing to move to PostgreSQL 11.1 from 9.3.x, this move will be a complete rebuild on new hardware. After setting up the new hardware installing/configuring Rhel 7.5 (updated), installing/configuring PostgreSQL 11.1 I tested our migration process, parallel dump --> parellel restoring databases worked without issue. I started testing backups and that is when I came across the fsync error. I have seen references that PostgreSQL does not support storing the data directory on CIFS due to similar issues. Although I have not found any reference to backing up to CIFS not being supported. I am able to fully restore and recovery from these backups no issue and based off the research I have done I would suspect some sort of issue of cifs not supporting the fsync call on the containing directory level. I put examples below of all of the testing I have performed that has also lead me to this conclusion. Environment: - Server ○ Model: Dell R740 ○ RAM: 768 GB RDIMM 2666MT/s ○ Processor: Intel Xeon Gold 6146 3.2G 24.75MB § 2 Nodes, 12 cores each, HT = total cores of 48 ○ Storage: § OS: local ssd in raid § pgdata, pgwal, pglog: each on their own dedicated EMC XIO luns attached via 16 8Gbps paths (2x QLogic 2562, Dual Port 8Gb Optical Fibre Channel HBAs) □ XIO is XtremIO V1 in two brick clustered configuration ○ OS: § uname -a: □ Linux 3.10.0-862.14.4.el7.x86_64 #1 SMP Fri Sep 21 09:07:21 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux § cat /etc/redhat-release: □ Red Hat Enterprise Linux Server release 7.5 (Maipo) ○ PostgreSQL § PostgreSQL 11.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28), 64-bit § Custom Configs: [postgres@servername data1]$ cat postgresql.auto.conf # Do not edit this file manually! # It will be overwritten by the ALTER SYSTEM command. port = '5432' listen_addresses = '0.0.0.0' work_mem = '8MB' maintenance_work_mem = '1GB' random_page_cost = '1.0' track_functions = 'all' wal_buffers = '-1' checkpoint_timeout = '10min' checkpoint_completion_target = '0.9' checkpoint_warning = '30s' log_destination = 'csvlog' log_directory = '/dbalog/data1' log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' log_min_messages = 'error' log_min_error_statement = 'error' log_line_prefix = '[%u] [%d] [%h] [%m] [%p]:>' log_rotation_size = '10MB' log_statement = 'ddl' shared_buffers = '8GB' max_connections = '500' effective_cache_size = '589824 MB' wal_level = 'replica' max_wal_senders = '2' archive_mode = 'on' archive_command = 'test ! -f /pgxlog1/data1/%f || cp /pgxlog1/data1/%f /cifs/backups/<backupDirectoryName>/dmp/archive/%f' log_disconnections = 'on' standard_conforming_strings = 'off' § Databases □ 4 dbs ® 3 - very small < 3 GB total ® 1 - 964.67 GB □ Load: OLTP, DW mix - CIFS (Backup destination) ○ Windows 2012 R2 (last patched September/2018) ○ VNX Block Storage lun configured for CIFS using NTFS Testing/Reproduction: - T1: ○ basebackup to windows cifs § Command: □ pg_basebackup -D /cifs/backups/<backupDirectoryName> -U backupuser -Ft -Z 1 -X fetch -p 5432 § Error: pg_basebackup: could not fsync file "/cifs/backups/<backupDirectoryName>": Invalid argument - T2: ○ Same backup as T1 with much less data - same result - T3: ○ Same as T2 with -N (--no-sync) option - no error (fairly obvious why) - T4 ○ basebackup same dataset as T2, no tar, no compression going to windows cifs § Command: □ pg_basebackup -D /cifs/backups/dbadb1linbos/5432/dmp/basebkp -U backupuser -X fetch -p 5432 § Same error seems to happen on all directories: pg_basebackup: could not fsync file "/cifs/backups/<backupDirectoryName>/basebkp/base/1": Invalid argument pg_basebackup: could not fsync file "/cifs/backups/<backupDirectoryName>/basebkp/base/13877": Invalid argument pg_basebackup: could not fsync file "/cifs/backups/<backupDirectoryName>/basebkp/base/13878": Invalid argument pg_basebackup: could not fsync file "/cifs/backups/<backupDirectoryName>/basebkp/base/16397": Invalid argument pg_basebackup: could not fsync file "/cifs/backups/<backupDirectoryName>/basebkp/base/16660": Invalid argument pg_basebackup: could not fsync file "/cifs/backups/<backupDirectoryName>/basebkp/base": Invalid argument pg_basebackup: could not fsync file "/cifs/backups/<backupDirectoryName>/basebkp/global": Invalid argument pg_basebackup: could not fsync file "/cifs/backups/<backupDirectoryName>/basebkp/log": Invalid argument pg_basebackup: could not fsync file "/cifs/backups/<backupDirectoryName>/basebkp/pg_commit_ts": Invalid argument pg_basebackup: could not fsync file "/cifs/backups/<backupDirectoryName>/basebkp/pg_dynshmem": Invalid argument pg_basebackup: could not fsync file "/cifs/backups/<backupDirectoryName>/basebkp/pg_logical/mappings": Invalid argument pg_basebackup: could not fsync file "/cifs/backups/<backupDirectoryName>/basebkp/pg_logical/snapshots": Invalid argument pg_basebackup: could not fsync file "/cifs/backups/<backupDirectoryName>/basebkp/pg_logical": Invalid argument pg_basebackup: could not fsync file "/cifs/backups/<backupDirectoryName>/basebkp/pg_multixact/members": Invalid argument pg_basebackup: could not fsync file "/cifs/backups/<backupDirectoryName>/basebkp/pg_multixact/offsets": Invalid argument pg_basebackup: could not fsync file "/cifs/backups/<backupDirectoryName>/basebkp/pg_multixact": Invalid argument pg_basebackup: could not fsync file "/cifs/backups/<backupDirectoryName>/basebkp/pg_notify": Invalid argument pg_basebackup: could not fsync file "/cifs/backups/<backupDirectoryName>/basebkp/pg_replslot": Invalid argument pg_basebackup: could not fsync file "/cifs/backups/<backupDirectoryName>/basebkp/pg_serial": Invalid argument pg_basebackup: could not fsync file "/cifs/backups/<backupDirectoryName>/basebkp/pg_snapshots": Invalid argument pg_basebackup: could not fsync file "/cifs/backups/<backupDirectoryName>/basebkp/pg_stat": Invalid argument pg_basebackup: could not fsync file "/cifs/backups/<backupDirectoryName>/basebkp/pg_stat_tmp": Invalid argument pg_basebackup: could not fsync file "/cifs/backups/<backupDirectoryName>/basebkp/pg_subtrans": Invalid argument pg_basebackup: could not fsync file "/cifs/backups/<backupDirectoryName>/basebkp/pg_tblspc": Invalid argument pg_basebackup: could not fsync file "/cifs/backups/<backupDirectoryName>/basebkp/pg_twophase": Invalid argument pg_basebackup: could not fsync file "/cifs/backups/<backupDirectoryName>/basebkp/pg_wal/archive_status": Invalid argument pg_basebackup: could not fsync file "/cifs/backups/<backupDirectoryName>/basebkp/pg_wal": Invalid argument pg_basebackup: could not fsync file "/cifs/backups/<backupDirectoryName>/basebkp/pg_xact": Invalid argument pg_basebackup: could not fsync file "/cifs/backups/<backupDirectoryName>/basebkp": Invalid argument pg_basebackup: could not fsync file "/cifs/backups/<backupDirectoryName>/basebkp/pg_tblspc": Invalid argument - T5 ○ pg_dump single threaded backup of a database to windows cifs - no error - T6 ○ pg_dump multithreaded backup of a database to windows cifs - same fsync error on containing directory § Command: pg_dump -p 5432 -j 16 -f /cifs/backups/<backupDirectoryName>/dmp/DBA_02132019_133120 -U backupuser -Fd -d DBA § Error: □ pg_dump: could not fsync file "/cifs/backups/<backupDirectoryName>/dmp/DBA_02132019_133120": Invalid argument - T7 ○ Same as T6 but with --no-sync option - no error - T8 ○ Same as T1 but to local storage (XIO SAN attached Lun ext4) § No error - T9 ○ Same as T1 but to linux CIFS share with XIO SAN attached lun ext4 (same version of linux) § Same Fsync error - T10 ○ Same as T1 but to linux NFS share with XIO SAN attached lun ext4 (same version of linux) § No Error Questions: - Is backing up to CIFS supported? - Based on research and other reported issues that the error may mean that cifs handles this call differently or already performs this action itself, should this error bring the integrity of the backup into question? ○ In what scenarios would it bring the integrity into question? ○ Issue reference, see bug #6372 thread: https://www.postgresql.org/message-id/1149.1325535272%40sss.pgh.pa.us - Is there a workaround or configuration that we could use that maintains using fsync to our current windows CIFS configuration? - Is there a better way to check integrity of backup rather than restoring and performing a dump backup? - Recommended steps forward?
pgsql-bugs by date: