Thread: BUG #16041: Error shows up both in pgAdmin and in Ruby (pg gem) - Segmentation fault
BUG #16041: Error shows up both in pgAdmin and in Ruby (pg gem) - Segmentation fault
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 16041 Logged by: Mark Siemers Email address: mark.siemers@gmail.com PostgreSQL version: 12.0 Operating system: Mac OS X Mojave 10.14.6 Description: For further details (including crash report) see bugs filed with third-parties: Ruby - https://bugs.ruby-lang.org/issues/16239 pgAdmin 4 - https://redmine.postgresql.org/issues/4813 The speculation from a ruby maintainer is there is an issue with GSS authentication on OS X. Snippet of stack trace below: 7 ??? 0x0000000200000000 0 + 8589934592 8 com.apple.security 0x00007fff3f57c059 invocation function for block in Security::KeychainCore::StorageManager::tickleKeychain(Security::KeychainCore::KeychainImpl*) + 287 9 libdispatch.dylib 0x00007fff5fd6d63d _dispatch_client_callout + 8 10 libdispatch.dylib 0x00007fff5fd79129 _dispatch_lane_barrier_sync_invoke_and_complete + 60 11 com.apple.security 0x00007fff3f57be47 Security::KeychainCore::StorageManager::tickleKeychain(Security::KeychainCore::KeychainImpl*) + 441 12 com.apple.security 0x00007fff3f37cae2 Security::KeychainCore::KCCursorImpl::next(Security::KeychainCore::Item&) + 230 13 com.apple.security 0x00007fff3f523c98 Security::KeychainCore::IdentityCursor::next(Security::SecPointer<Security::KeychainCore::Identity>&) + 192 14 com.apple.security 0x00007fff3f545f2f SecIdentitySearchCopyNext + 145 15 com.apple.security 0x00007fff3f550956 SecItemCopyMatching_osx(__CFDictionary const*, void const**) + 238 16 com.apple.security 0x00007fff3f553fc5 SecItemCopyMatching + 316 17 com.apple.Heimdal 0x00007fff4feae830 0x7fff4fe5c000 + 337968 18 com.apple.Heimdal 0x00007fff4fead35e hx509_certs_find + 67 19 com.apple.Heimdal 0x00007fff4fe88a6c _krb5_pk_find_cert + 246 20 com.apple.GSS 0x00007fff364dbd8e _gsspku2u_acquire_cred + 386 21 com.apple.GSS 0x00007fff364cb0d8 gss_acquire_cred + 523 22 libpq.5.dylib 0x0000000112b4b77d pg_GSS_have_cred_cache + 54 23 libpq.5.dylib 0x0000000112b39edf PQconnectPoll + 6377 24 libpq.5.dylib 0x0000000112b36f8b connectDBComplete + 232 25 libpq.5.dylib 0x0000000112b37112 PQconnectdb + 36 26 pg_ext.bundle 0x000000011157ab01 gvl_PQconnectdb_skeleton + 17 27 ruby 0x000000010f1dfff9 call_without_gvl + 185 28 pg_ext.bundle 0x000000011157aadd gvl_PQconnectdb + 45 29 pg_ext.bundle 0x000000011157fcb9 pgconn_init + 121 30 ruby 0x000000010f221b1c vm_call0_body + 604
Re: BUG #16041: Error shows up both in pgAdmin and in Ruby (pg gem) - Segmentation fault
From
Fahar Abbas
Date:
Hi,
Issue is not reproducible on MAC 10.12 for same PostgreSQL 12 server.
On Sat, Oct 5, 2019 at 3:43 AM PG Bug reporting form <noreply@postgresql.org> wrote:
The following bug has been logged on the website:
Bug reference: 16041
Logged by: Mark Siemers
Email address: mark.siemers@gmail.com
PostgreSQL version: 12.0
Operating system: Mac OS X Mojave 10.14.6
Description:
For further details (including crash report) see bugs filed with
third-parties:
Ruby - https://bugs.ruby-lang.org/issues/16239
pgAdmin 4 - https://redmine.postgresql.org/issues/4813
The speculation from a ruby maintainer is there is an issue with GSS
authentication on OS X.
Snippet of stack trace below:
7 ??? 0x0000000200000000 0 + 8589934592
8 com.apple.security 0x00007fff3f57c059 invocation function
for block in
Security::KeychainCore::StorageManager::tickleKeychain(Security::KeychainCore::KeychainImpl*)
+ 287
9 libdispatch.dylib 0x00007fff5fd6d63d
_dispatch_client_callout + 8
10 libdispatch.dylib 0x00007fff5fd79129
_dispatch_lane_barrier_sync_invoke_and_complete + 60
11 com.apple.security 0x00007fff3f57be47
Security::KeychainCore::StorageManager::tickleKeychain(Security::KeychainCore::KeychainImpl*)
+ 441
12 com.apple.security 0x00007fff3f37cae2
Security::KeychainCore::KCCursorImpl::next(Security::KeychainCore::Item&) +
230
13 com.apple.security 0x00007fff3f523c98
Security::KeychainCore::IdentityCursor::next(Security::SecPointer<Security::KeychainCore::Identity>&)
+ 192
14 com.apple.security 0x00007fff3f545f2f
SecIdentitySearchCopyNext + 145
15 com.apple.security 0x00007fff3f550956
SecItemCopyMatching_osx(__CFDictionary const*, void const**) + 238
16 com.apple.security 0x00007fff3f553fc5 SecItemCopyMatching +
316
17 com.apple.Heimdal 0x00007fff4feae830 0x7fff4fe5c000 +
337968
18 com.apple.Heimdal 0x00007fff4fead35e hx509_certs_find +
67
19 com.apple.Heimdal 0x00007fff4fe88a6c _krb5_pk_find_cert +
246
20 com.apple.GSS 0x00007fff364dbd8e
_gsspku2u_acquire_cred + 386
21 com.apple.GSS 0x00007fff364cb0d8 gss_acquire_cred +
523
22 libpq.5.dylib 0x0000000112b4b77d
pg_GSS_have_cred_cache + 54
23 libpq.5.dylib 0x0000000112b39edf PQconnectPoll +
6377
24 libpq.5.dylib 0x0000000112b36f8b connectDBComplete +
232
25 libpq.5.dylib 0x0000000112b37112 PQconnectdb + 36
26 pg_ext.bundle 0x000000011157ab01
gvl_PQconnectdb_skeleton + 17
27 ruby 0x000000010f1dfff9 call_without_gvl +
185
28 pg_ext.bundle 0x000000011157aadd gvl_PQconnectdb +
45
29 pg_ext.bundle 0x000000011157fcb9 pgconn_init + 121
30 ruby 0x000000010f221b1c vm_call0_body + 604
--
Fahar Abbas

QMG
EnterpriseDB Corporation
Phone Office: +92-51-835-8874Phone Direct: +92-51-8466803
Mobile: +92-333-5409707
Skype ID: live:fahar.abbas
Website: www.enterprisedb.com
Attachment
Re: BUG #16041: Error shows up both in pgAdmin and in Ruby (pg gem) -Segmentation fault
From
Chris Bandy
Date:
Hello, I am able to reproduce this on macOS 10.14 (Mojave) in multiple versions of Ruby and in a minimal C program. Steps to reproduce: 1. Install libpq for PostgreSQL 12: brew install postgresql@12 2. Install the pg gem: gem install pg 2. Start a PostgreSQL server: docker run --rm -d -p 127.0.0.1:5432:5432 postgres:12 3. Execute some GSS path before and after fork: ruby -r pg -e ' PG.connect(host: "localhost") Process.fork { PG.connect(host: "localhost") } Process.wait ' Notice that host must be a TCP address (not Unix) and gssencmode must be "prefer" (default is "prefer".) The version of the server doesn't appear to matter; I tested 10, 11, and 12. This can also happen in `rails console` if an application initializer interacts with ActiveRecord or a descendant (i.e. opens a database connection.) Any further interaction with ActiveRecord on the console segfaults. This has been reported in a variety of Ruby projects and often dismissed as "a PostgreSQL issue." I found a similar trace in a Python package that interacts with the macOS keychain.[1] There they narrowed it to a single call, raised the issue upstream, and were told in-short "you can't use keychain after fork." Based on that report, I crafted a minimal C program to make the same GSS call as libpq. I compiled (with deprecation warnings) and tested with the following: gcc macos-gss-crash.c -o macos-gss-crash -lgssapi_krb5 ./macos-gss-crash It prints: before gss_acquire_cred in main after gss_acquire_cred in main gss complete: true before gss_acquire_cred in child child signalled: 11 I've attached the C program and crash reports for it and the above Ruby snippet. Thanks! Chris [1]: https://github.com/jaraco/keyring/issues/281 On 10/4/19 5:43 PM, PG Bug reporting form wrote: > The following bug has been logged on the website: > > Bug reference: 16041 > Logged by: Mark Siemers > Email address: mark.siemers@gmail.com > PostgreSQL version: 12.0 > Operating system: Mac OS X Mojave 10.14.6 > Description: > > For further details (including crash report) see bugs filed with > third-parties: > Ruby - https://bugs.ruby-lang.org/issues/16239 > pgAdmin 4 - https://redmine.postgresql.org/issues/4813 > > The speculation from a ruby maintainer is there is an issue with GSS > authentication on OS X. > > Snippet of stack trace below: > 7 ??? 0x0000000200000000 0 + 8589934592 > 8 com.apple.security 0x00007fff3f57c059 invocation function > for block in > Security::KeychainCore::StorageManager::tickleKeychain(Security::KeychainCore::KeychainImpl*) > + 287 > 9 libdispatch.dylib 0x00007fff5fd6d63d > _dispatch_client_callout + 8 > 10 libdispatch.dylib 0x00007fff5fd79129 > _dispatch_lane_barrier_sync_invoke_and_complete + 60 > 11 com.apple.security 0x00007fff3f57be47 > Security::KeychainCore::StorageManager::tickleKeychain(Security::KeychainCore::KeychainImpl*) > + 441 > 12 com.apple.security 0x00007fff3f37cae2 > Security::KeychainCore::KCCursorImpl::next(Security::KeychainCore::Item&) + > 230 > 13 com.apple.security 0x00007fff3f523c98 > Security::KeychainCore::IdentityCursor::next(Security::SecPointer<Security::KeychainCore::Identity>&) > + 192 > 14 com.apple.security 0x00007fff3f545f2f > SecIdentitySearchCopyNext + 145 > 15 com.apple.security 0x00007fff3f550956 > SecItemCopyMatching_osx(__CFDictionary const*, void const**) + 238 > 16 com.apple.security 0x00007fff3f553fc5 SecItemCopyMatching + > 316 > 17 com.apple.Heimdal 0x00007fff4feae830 0x7fff4fe5c000 + > 337968 > 18 com.apple.Heimdal 0x00007fff4fead35e hx509_certs_find + > 67 > 19 com.apple.Heimdal 0x00007fff4fe88a6c _krb5_pk_find_cert + > 246 > 20 com.apple.GSS 0x00007fff364dbd8e > _gsspku2u_acquire_cred + 386 > 21 com.apple.GSS 0x00007fff364cb0d8 gss_acquire_cred + > 523 > 22 libpq.5.dylib 0x0000000112b4b77d > pg_GSS_have_cred_cache + 54 > 23 libpq.5.dylib 0x0000000112b39edf PQconnectPoll + > 6377 > 24 libpq.5.dylib 0x0000000112b36f8b connectDBComplete + > 232 > 25 libpq.5.dylib 0x0000000112b37112 PQconnectdb + 36 > 26 pg_ext.bundle 0x000000011157ab01 > gvl_PQconnectdb_skeleton + 17 > 27 ruby 0x000000010f1dfff9 call_without_gvl + > 185 > 28 pg_ext.bundle 0x000000011157aadd gvl_PQconnectdb + > 45 > 29 pg_ext.bundle 0x000000011157fcb9 pgconn_init + 121 > 30 ruby 0x000000010f221b1c vm_call0_body + 604 >
Attachment
Re: BUG #16041: Error shows up both in pgAdmin and in Ruby (pg gem) -Segmentation fault
From
Chris Bandy
Date:
On 12/3/19 3:33 PM, Chris Bandy wrote: > Hello, > > I am able to reproduce this on macOS 10.14 (Mojave) in multiple versions > of Ruby and in a minimal C program. > I was also able to reproduce this with the attached Python program and psycopg2 package. Steps to reproduce: 1. Install libpq for PostgreSQL 12: brew install postgresql@12 2. Install the psycopg2 package: pip install psycopg2 3. Start a PostgreSQL server: docker run --rm -d -p 127.0.0.1:5432:5432 postgres:12 4. Execute some GSS path before and after fork: python macos-gss-crash.py It generates a crash report and prints: main ok -11 In this and the previous tests I can avoid/workaround the segfault by specifying gssencmode=disable. Thanks! Chris
Attachment
Re: BUG #16041: Error shows up both in pgAdmin and in Ruby (pg gem)- Segmentation fault
From
Stephen Frost
Date:
Greetings, * Chris Bandy (chris.bandy@crunchydata.com) wrote: > Notice that host must be a TCP address (not Unix) and gssencmode must be > "prefer" (default is "prefer".) The version of the server doesn't appear to > matter; I tested 10, 11, and 12. So, gssencmode didn't exist in 10 or 11- but are you actually testing those different versions of *libpq*? That's really what is relevant here, I believe, if libpq is actually even relevant at all... > This has been reported in a variety of Ruby projects and often dismissed as > "a PostgreSQL issue." I'm really inclined to say that this isn't a PG issue... > Based on that report, I crafted a minimal C program to make the same GSS > call as libpq. I compiled (with deprecation warnings) and tested with the > following: > > gcc macos-gss-crash.c -o macos-gss-crash -lgssapi_krb5 > ./macos-gss-crash Particularly since that isn't linking against libpq and it's still crashing. I took the liberty to update the C code version to run on a Linux system, and sure enough, it works just fine: before gss_acquire_cred in main after gss_acquire_cred in main gss complete: true before gss_acquire_cred in child after gss_acquire_cred in child gss complete: true child exit code: 0 (also tested w/o having GSS creds and it still worked without a crash) The only difference I needed to get it to compile on my Ubuntu box was to add: #include <sys/types.h> #include <sys/wait.h> and then compile as: ➜ ~ gcc macos-gss-crash.c -o macos-gss-crash -I /usr/include/mit-krb5 -L /usr/lib/x86_64-linux-gnu/mit-krb5 -lgssapi_krb5 > It prints: > > before gss_acquire_cred in main > after gss_acquire_cred in main > gss complete: true > before gss_acquire_cred in child > child signalled: 11 > > I've attached the C program and crash reports for it and the above Ruby > snippet. Unfortunately, MacOS is pretty well known to be terrible about less commonly used libraries and maintaining them. I'd suggest building a current version of the Kerberos libraries, making sure you're linking against just those and not whatever is provided by MacOS, and see if you still have an issue. The other possibility is that this is an current bug in Heimdal, which seems to be the Kerberos library being used on MacOS, in which case you'd need to bring up the issue with them. There seems to be some indepedent confirmation of this being an issue with the Heimdal provided by MacOS: https://github.com/zenchild/gssapi/issues/12 The docs for gss_acquire_cred() don't seem to say much about what happens when there's a fork(): https://docs.oracle.com/cd/E19683-01/816-1331/overview-141/index.html If there's something we should be doing differently with gss_acquire_cred() to "fix" this then I'm certainly open to it but I'm really not sure what we'd do here; it seems pretty clearly to be some issue where the Kerberos/Heimdal library being used is maintaining its own state and getting confused after a fork happens. Thanks, Stephen
Attachment
Re: BUG #16041: Error shows up both in pgAdmin and in Ruby (pg gem) -Segmentation fault
From
Chris Bandy
Date:
On 12/3/19 5:31 PM, Stephen Frost wrote: > Greetings, > > * Chris Bandy (chris.bandy@crunchydata.com) wrote: >> Notice that host must be a TCP address (not Unix) and gssencmode must be >> "prefer" (default is "prefer".) The version of the server doesn't appear to >> matter; I tested 10, 11, and 12. > > So, gssencmode didn't exist in 10 or 11- but are you actually testing > those different versions of *libpq*? No, the libpq version in my tests is always 12. I was trying to say that it doesn't appear to be an issue with the protocol/negotiation of GSS encryption. That does make me wonder, though, if/how the _server_ built by `brew install postgresql` might be impacted by the macOS GSSAPI? All my tests targeted a linux server. >> This has been reported in a variety of Ruby projects and often dismissed as >> "a PostgreSQL issue." > > I'm really inclined to say that this isn't a PG issue... I agree, but at the same time the perception seems to be that using/connecting to PostgreSQL crashes one's application. I think the very reasonable default of gssencmode=prefer is partly responsible. Users don't realize that by upgrading libpq they are opting in to new security code paths (and library compatibility issues.) > Unfortunately, MacOS is pretty well known to be terrible about less > commonly used libraries and maintaining them. I'd suggest building a > current version of the Kerberos libraries, making sure you're linking > against just those and not whatever is provided by MacOS, and see if you > still have an issue. Investigating this has been the deepest exposure I've had to this... yes, "unfortunate" reality. Homebrew provides a recent version of krb5 (1.17 at this time) so I set out to use it. A small diff to the formula proved successful. I'll submit a patch to Homebrew linking back to this thread. Is there anything that can/should be done on PostgreSQL's end now that we know about this situation? The most I can imagine is to issue a warning when macOS's GSSAPI is detected during build/configure. I don't know how to do the latter and won't be surprised if the answer to the former is "no." > The other possibility is that this is an current bug in Heimdal, which > seems to be the Kerberos library being used on MacOS, in which case > you'd need to bring up the issue with them. I'm out of my depth on this front. My impression from the traces is that the incompatibility is in macOS keychain, and I'm willing to leave it at that. While researching this topic, I found multiple cases where fork() and the "dispatch queue" are incompatible.[1] > There seems to be some indepedent confirmation of this being an issue > with the Heimdal provided by MacOS: > > https://github.com/zenchild/gssapi/issues/12 I don't see any C level backtrace information in that thread, so I can't tell if its the same issue. Thank you for your help! Chris [1]: https://www.evanjones.ca/fork-is-dangerous.html
Re: BUG #16041: Error shows up both in pgAdmin and in Ruby (pg gem)- Segmentation fault
From
Stephen Frost
Date:
Greetings, * Chris Bandy (chris.bandy@crunchydata.com) wrote: > On 12/3/19 5:31 PM, Stephen Frost wrote: > >* Chris Bandy (chris.bandy@crunchydata.com) wrote: > >>Notice that host must be a TCP address (not Unix) and gssencmode must be > >>"prefer" (default is "prefer".) The version of the server doesn't appear to > >>matter; I tested 10, 11, and 12. > > > >So, gssencmode didn't exist in 10 or 11- but are you actually testing > >those different versions of *libpq*? > > No, the libpq version in my tests is always 12. I was trying to say that it > doesn't appear to be an issue with the protocol/negotiation of GSS > encryption. No, I don't think it's got anything to do with that ... or largely to do with PG, except that libpq with v12 now uses more of the GSSAPI library than it used to. > That does make me wonder, though, if/how the _server_ built by `brew install > postgresql` might be impacted by the macOS GSSAPI? All my tests targeted a > linux server. I wouldn't be at all surprised if there's other bugs lurking in the old version of Heimdal that Apple hacked up and distributes with their base OS. > >>This has been reported in a variety of Ruby projects and often dismissed as > >>"a PostgreSQL issue." > > > >I'm really inclined to say that this isn't a PG issue... > > I agree, but at the same time the perception seems to be that > using/connecting to PostgreSQL crashes one's application. I think the very > reasonable default of gssencmode=prefer is partly responsible. Users don't > realize that by upgrading libpq they are opting in to new security code > paths (and library compatibility issues.) Perception isn't reality though and upgrading to a new major version of libpq is going to pretty regularly involves new library calls or calls being made in ways they weren't before. If that exposes a bug in that library (particularly one that's been fixed in more recent versions of the library), that's not on us to hack around or attempt to solve, imv. Perhaps someone else has a differing opinion and wants to try and figure out a way to solve this that doesn't materially make things worse for users that are running with a modern library, which would be great, but I can't get too worked up about it. > >Unfortunately, MacOS is pretty well known to be terrible about less > >commonly used libraries and maintaining them. I'd suggest building a > >current version of the Kerberos libraries, making sure you're linking > >against just those and not whatever is provided by MacOS, and see if you > >still have an issue. > > Investigating this has been the deepest exposure I've had to this... yes, > "unfortunate" reality. > > Homebrew provides a recent version of krb5 (1.17 at this time) so I set out > to use it. A small diff to the formula proved successful. I'll submit a > patch to Homebrew linking back to this thread. Great, that sounds like it's probably the right approach to addressing this. > Is there anything that can/should be done on PostgreSQL's end now that we > know about this situation? The most I can imagine is to issue a warning when > macOS's GSSAPI is detected during build/configure. I don't know how to do > the latter and won't be surprised if the answer to the former is "no." I wouldn't be against doing something here but I don't have a Mac myself and I don't plan to spend time trying to hack around their broken library. I'm also not entirely convinced that we should just throw an error if we come across this busted library- psql doesn't fork and hasn't got any problems, so it seems a bit overkill to just refuse to work with the MacOS library. > >The other possibility is that this is an current bug in Heimdal, which > >seems to be the Kerberos library being used on MacOS, in which case > >you'd need to bring up the issue with them. > > I'm out of my depth on this front. My impression from the traces is that the > incompatibility is in macOS keychain, and I'm willing to leave it at that. > While researching this topic, I found multiple cases where fork() and the > "dispatch queue" are incompatible.[1] I'm.. not terribly impressed by that blog's arguments around fork(), particularly since it seems to be claiming things that are actually not true about fork but which are true about threads. In fact, what it seems to really be getting at is that running with threads and fork'ing at the same time is awful complicated to get right, and that's pretty accurate, but that doesn't make just using fork() an issue. That blog post aside, it looks like what it's getting at is that you can't link to MacOS libraries and also fork() and expect things to be sane, and while that's unfortuante, that isn't really our issue to go figure out how to fix or address. Thanks, Stephen