Re: Server crash due to SIGBUS(Bus Error) when trying to access the memory created using dsm_create(). - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Server crash due to SIGBUS(Bus Error) when trying to access the memory created using dsm_create(). |
Date | |
Msg-id | CA+TgmoY=jMECqMz=RKG3=x=kvMZrf008Ggc4mYNyajqbVkMW4w@mail.gmail.com Whole thread Raw |
In response to | Re: Server crash due to SIGBUS(Bus Error) when trying to access the memory created using dsm_create(). (Thomas Munro <thomas.munro@enterprisedb.com>) |
Responses |
Re: [HACKERS] Server crash due to SIGBUS(Bus Error) when trying toaccess the memory created using dsm_create().
|
List | pgsql-hackers |
On Mon, Aug 22, 2016 at 8:18 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Tue, Aug 23, 2016 at 8:41 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> We could test to see how much it slows things down. But it >> may be worth paying the cost even if it ends up being kinda expensive. > > Here are some numbers from a Xeon E7-8830 @ 2.13GHz running Linux 3.10 > running the attached program. It's fairly noisy and I didn't run > super long tests with many repeats, but the general theme is visible. > If you're actually going to USE the memory, it's only a small extra > cost to have reserved seats. But if there's a strong chance you'll > never access most of the memory, you might call it expensive. > > Segment size 1MB: > > base = shm_open + ftruncate + mmap + munmap + close = 5us > base + fallocate = 38us > base + memset = 332us > base + fallocate + memset = 346us > > Segment size 1GB: > > base = shm_open + ftruncate + mmap + munmap + close = 10032us > base + fallocate = 30774us > base + memset = 602925us > base + fallocate + memset = 655433us Typical DSM segments for parallel query seem to be much smaller than 1MB. I just added an elog(NOTICE, ...) to dsm_create to print the size and ran the regression tests. I got these results: + NOTICE: dsm_create: 89352 + NOTICE: dsm_create: 332664 + NOTICE: dsm_create: 86664 So for parallel query we're looking at a hit that is probably in the range of one-tenth of one millisecond or less, which sees like it's not really a big deal considering that the typical startup time is 4ms and, really, at this point, we're aiming to use this primarily for queries with runtimes in the hundreds of milliseconds and more. Also, the code can be arbitrarily fast if it doesn't have to be safe. Now, for bigger segment sizes, I think there actually could be a little bit of a noticeable performance hit here, because it's not just about total elapsed time. Even if the code eventually touches all of the memory, it might not touch it all before starting to fire up workers or whatever else it wants to do with the DSM segment. But I'm thinking we still need to bite the bullet and pay the expense, because crash-and-restart cycles are *really* bad. Assuming the DSA code you submitted gets committed, that's really where the hit will be here: you'll be be merrily allocating chunks of dynamic shared memory until your existing DSM segment fills up, and then, kaboom, you'll go into the tank for half a second when you try to do the next allocation, supposing the next segment is 1GB in size. That's not much fun, especially considering that . But again, unless we have a faster way to force the system to allocate the pages, I think we're just going to have to live with that. :-( -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: