Re: PANIC: could not flush dirty data: Cannot allocate memory - Mailing list pgsql-general
From | klaus.mailinglists@pernau.at |
---|---|
Subject | Re: PANIC: could not flush dirty data: Cannot allocate memory |
Date | |
Msg-id | 4eeb184a1f907c0deab774429602568b@pernau.at Whole thread Raw |
In response to | Re: PANIC: could not flush dirty data: Cannot allocate memory (klaus.mailinglists@pernau.at) |
Responses |
Re: PANIC: could not flush dirty data: Cannot allocate memory
|
List | pgsql-general |
Hello all! Thanks for the many hints to look for. We did some tuning and further debugging and here are the outcomes, answering all questions in a single email. > In the meantime, you could experiment with setting > checkpoint_flush_after to 0 We did this: # SHOW checkpoint_flush_after; checkpoint_flush_after ------------------------ 0 (1 row) But we STILL have PANICs. I tried to understand the code but failed. I guess that there are some code paths which call pg_flush_data() without checking this settings, or the check does not work. > Did this start after upgrading to 22.04? Or after a certain kernel > upgrade? For sure it only started with Ubuntu 22.04. We did not had and still not have any issues on servers with Ubuntu 20.04 and 18.04. > I would believe that the kernel would raise > a bunch of printks if it hit ENOMEM in the commonly used paths, so > you would see something in dmesg or wherever you collect your kernel > log if it happened where it was expected. There is nothing in the kernel logs (dmesg) > Do you use cgroups or such to limit memory usage of postgres? No > Any uncommon options on the filesystem or the mount point? No. Also no Antivirus: /dev/xvda2 / ext4 noatime,nodiratime,errors=remount-ro 0 1 or LABEL=cloudimg-rootfs / ext4 discard,errors=remount-ro 0 1 > does this happen on all the hosts, or is it limited to one host or one > technology? It happens on XEN VMs, KVM VMs and VMware VMs. On Intel and AMD plattforms. > Another interesting thing would be to know the mount and file system > options > for the FS that triggers the failures. E.g. # tune2fs -l /dev/sda1 tune2fs 1.46.5 (30-Dec-2021) Filesystem volume name: cloudimg-rootfs Last mounted on: / Filesystem UUID: 0522e6b3-8d40-4754-a87e-5678a6921e37 Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent 64bit flex_bg encrypt sparse_super large_file huge_file dir_nlink extra_isize metadata_csum Filesystem flags: signed_directory_hash Default mount options: user_xattr acl Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 12902400 Block count: 26185979 Reserved block count: 0 Overhead clusters: 35096 Free blocks: 18451033 Free inodes: 12789946 First block: 0 Block size: 4096 Fragment size: 4096 Group descriptor size: 64 Reserved GDT blocks: 243 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 16128 Inode blocks per group: 1008 Flex block group size: 16 Filesystem created: Wed Apr 20 18:31:24 2022 Last mount time: Thu Nov 10 09:49:34 2022 Last write time: Thu Nov 10 09:49:34 2022 Mount count: 7 Maximum mount count: -1 Last checked: Wed Apr 20 18:31:24 2022 Check interval: 0 (<none>) Lifetime writes: 252 GB Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 256 Required extra isize: 32 Desired extra isize: 32 Journal inode: 8 First orphan inode: 42571 Default directory hash: half_md4 Directory Hash Seed: c5ef129b-fbee-4f35-8f28-ad7cc93c1c43 Journal backup: inode blocks Checksum type: crc32c Checksum: 0xb74ebbc3 Thanks Klaus
pgsql-general by date: