Re: [HACKERS] Server crash due to SIGBUS(Bus Error) when trying toaccess the memory created using dsm_create(). - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: [HACKERS] Server crash due to SIGBUS(Bus Error) when trying toaccess the memory created using dsm_create(). |
Date | |
Msg-id | 20170628230458.n5ehizmvhoerr5yq@alap3.anarazel.de Whole thread Raw |
In response to | Re: [HACKERS] Server crash due to SIGBUS(Bus Error) when trying toaccess the memory created using dsm_create(). (Thomas Munro <thomas.munro@enterprisedb.com>) |
Responses |
Re: [HACKERS] Server crash due to SIGBUS(Bus Error) when trying toaccess the memory created using dsm_create().
|
List | pgsql-hackers |
On 2017-06-28 19:07:50 +1200, Thomas Munro wrote: > I think this line is saying that it won't restart automatically: > > https://github.com/torvalds/linux/blob/590dce2d4934fb909b112cd80c80486362337744/mm/shmem.c#L2884 Indeed. > So I think we either need to mask signals with or put in an explicit > retry loop, as shown in the attached version of the patch. With the > v3 patch I posted earlier, I see interrupted system call failures in > the select_parallel regression test, but with the v4 it passes. > Thoughts? I think a retry loop is a lot better than blocking signals. > > Unfounded speculation: fallocate() might actually *improve* > > performance of DSM segments if your access pattern involves random > > access (just to pick an example out of the air, something like... > > building a hash table), since it's surely easier to allocate a big > > contiguous chunk than a squillion random pages most of which divide an > > existing hole into two smaller holes... > > Bleugh... I retract this, of course we initialise the hash table in > order anyway so this doesn't make any sense. It's still faster to create larger mappings at once, rather than through subsequent page faults... > diff --git a/configure.in b/configure.in > index 11eb9c8acfc..47452bbac43 100644 > --- a/configure.in > +++ b/configure.in > @@ -1429,7 +1429,7 @@ PGAC_FUNC_WCSTOMBS_L > LIBS_including_readline="$LIBS" > LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'` > > -AC_CHECK_FUNCS([cbrt clock_gettime dlopen fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll pstat pthread_is_threaded_npreadlink setproctitle setsid shm_open symlink sync_file_range towlower utime utimes wcstombs wcstombs_l]) > +AC_CHECK_FUNCS([cbrt clock_gettime dlopen fallocate fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove pollpstat pthread_is_threaded_np readlink setproctitle setsid shm_open symlink sync_file_range towlower utime utimes wcstombswcstombs_l]) Why are we going for fallocate rather than posix_fallocate()? There's plenty use cases that can only be done with the former, but this doesn't seem like one of them? Currently this is a linux specific path - but I don't actually see any reason to keep it that way? It seems far from unlikely that this is just an issue with linux... > > #ifdef USE_DSM_POSIX > /* > + * Set the size of a virtual memory region associate with a file descriptor. > + * On Linux, also ensure that virtual memory is actually allocated by the > + * operating system to avoid nasty surprises later. > + * > + * Returns non-zero if either truncation or allocation fails, and sets errno. > + */ > +static int > +dsm_impl_posix_resize(int fd, off_t size) > +{ > + int rc; > + > + /* Truncate (or extend) the file to the requested size. */ > + rc = ftruncate(fd, size); > + > +#ifdef HAVE_FALLOCATE > +#ifdef __linux__ > + /* > + * On Linux, a shm_open fd is backed by a tmpfs file. After resizing > + * with ftruncate it may contain a hole. Accessing memory backed by a > + * hole causes tmpfs to allocate pages, which fails with SIGBUS if > + * there is no virtual memory available. So we ask tmpfs to allocate > + * pages here, so we can fail gracefully with ENOSPC now rather than > + * risking SIGBUS later. > + */ > + if (rc == 0) > + { > + do > + { > + rc = fallocate(fd, 0, 0, size); > + } while (rc == -1 && errno == EINTR); > + if (rc != 0 && errno == ENOSYS) > + { > + /* Kernel too old (< 2.6.23). */ > + rc = 0; > + } > + } > +#endif > +#endif I'd rather fall-back to just write() initializing the buffer, and do either of those in all implementations rather than just linux. > diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in > index 7a05c7e5b85..dcb9a846c7b 100644 > --- a/src/include/pg_config.h.in > +++ b/src/include/pg_config.h.in > @@ -167,6 +167,9 @@ > /* Define to 1 if you have the <editline/readline.h> header file. */ > #undef HAVE_EDITLINE_READLINE_H > > +/* Define to 1 if you have the `fallocate' function. */ > +#undef HAVE_FALLOCATE > + > /* Define to 1 if you have the `fdatasync' function. */ > #undef HAVE_FDATASYNC Needs pg_config.h.win32 adjustment... Greetings, Andres Freund
pgsql-hackers by date: