Thread: [BUGS] BUG #14581: invalid cache ID: 41 CONTEXT: parallel worker
The following bug has been logged on the website: Bug reference: 14581 Logged by: Stepan Yankevych Email address: stepya@ukr.net PostgreSQL version: 9.6.2 Operating system: RedHat Description: Time to time i have invalid cache ID: 41 while running simple query with parallelism. for example select count(1) from client_order where date_id = 20170301; Crashes with (see error log) < 2017-03-07 08:57:45.312 EST >ERROR: invalid cache ID: 41 < 2017-03-07 08:57:45.312 EST >ERROR: invalid cache ID: 41 < 2017-03-07 08:57:45.312 EST >ERROR: invalid cache ID: 41 < 2017-03-07 08:57:45.312 EST >ERROR: invalid cache ID: 41 < 2017-03-07 08:57:45.312 EST >ERROR: invalid cache ID: 41 < 2017-03-07 08:57:45.312 EST [unknown] SELECT>ERROR: invalid cache ID: 41 < 2017-03-07 08:57:45.312 EST [unknown] SELECT>CONTEXT: parallel worker < 2017-03-07 08:57:45.312 EST [unknown] SELECT>STATEMENT: select count(1) -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
stepya@ukr.net writes: > Time to time i have invalid cache ID: 41 while running simple query with > parallelism. Interesting, but unless you can show us how to reproduce this, we're not going to be able to do much about it. regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Hi Tom. Thanks for so quick response. It quite difficult to reproduce. The only observation. Usually it crashes with parallelism only on quite big tables with inheritance . The main table contains many partitions (inherited tables) We query main table with condition on date_id = ? . In the execution plan we can see one partition only. All the next runscrashes as well. Reconnect to the DB can help but no always. After some time the same query can successfully be run. Anyway I will try to write script to reproduce it. But not sure if I could be so lucky to reproduce it on a sample. Thanks! Best Regards, Stepan Yankevych Lead Software Engineer -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Tuesday, March 7, 2017 18:03 PM To: stepya@ukr.net Cc: pgsql-bugs@postgresql.org Subject: Re: [BUGS] BUG #14581: invalid cache ID: 41 CONTEXT: parallel worker stepya@ukr.net writes: > Time to time i have invalid cache ID: 41 while running simple query > with parallelism. Interesting, but unless you can show us how to reproduce this, we're not going to be able to do much about it. regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Stepan Yankevych <Stepan_Yankevych@epam.com> writes: > It quite difficult to reproduce. > The only observation. Usually it crashes with parallelism only on quite big tables with inheritance . If you can't extract a test case, one thing that would be quite helpful is to get a stack trace from the point of the error. There are only four occurrences of elog(ERROR, "invalid cache ID: %d", cacheId); and they're all in src/backend/utils/cache/syscache.c. If you could change those to elog(PANIC, ...) in a debug-enabled build, run till you get the failure, and then use gdb to get a backtrace from the ensuing core dump, that might be enough info to fix it. regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Unfortunately it is almost impossible. We experiencing such error on our PROD env only (starting from 9.6.0 version ) I will think about debug-enabled build on some of our dev environment, but not sure if we can reproduce it there due to muchless amount of data. Best Regards, Stepan Yankevych Lead Software Engineer -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Tuesday, March 7, 2017 20:16 PM To: Stepan Yankevych <Stepan_Yankevych@epam.com> Cc: stepya@ukr.net; pgsql-bugs@postgresql.org Subject: Re: [BUGS] BUG #14581: invalid cache ID: 41 CONTEXT: parallel worker Stepan Yankevych <Stepan_Yankevych@epam.com> writes: > It quite difficult to reproduce. > The only observation. Usually it crashes with parallelism only on quite big tables with inheritance . If you can't extract a test case, one thing that would be quite helpful is to get a stack trace from the point of the error. There are only four occurrences of elog(ERROR, "invalid cache ID: %d", cacheId); and they're all in src/backend/utils/cache/syscache.c. If you could change those to elog(PANIC, ...) in a debug-enabledbuild, run till you get the failure, and then use gdb to get a backtrace from the ensuing core dump, that mightbe enough info to fix it. regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
stepya@ukr.net wrote: > The following bug has been logged on the website: > > Bug reference: 14581 > Logged by: Stepan Yankevych > Email address: stepya@ukr.net > PostgreSQL version: 9.6.2 > Operating system: RedHat > Description: > > Time to time i have invalid cache ID: 41 while running simple query with > parallelism. > for example > select count(1) from client_order where date_id = 20170301; > Crashes with (see error log) > > < 2017-03-07 08:57:45.312 EST >ERROR: invalid cache ID: 41 > < 2017-03-07 08:57:45.312 EST >ERROR: invalid cache ID: 41 > < 2017-03-07 08:57:45.312 EST >ERROR: invalid cache ID: 41 > < 2017-03-07 08:57:45.312 EST >ERROR: invalid cache ID: 41 > < 2017-03-07 08:57:45.312 EST >ERROR: invalid cache ID: 41 > < 2017-03-07 08:57:45.312 EST [unknown] SELECT>ERROR: invalid cache ID: 41 > > < 2017-03-07 08:57:45.312 EST [unknown] SELECT>CONTEXT: parallel worker > < 2017-03-07 08:57:45.312 EST [unknown] SELECT>STATEMENT: select count(1) Could it be that the oracle_fdw extension was loaded? There was a bug reported yesterday that would explain this error: https://github.com/laurenz/oracle_fdw/issues/215 If oracle_fdw is involved, the latest commit should fix the problem. Yours, Laurenz Albe