Re: should there be a hard-limit on the number of transactionspending undo? - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: should there be a hard-limit on the number of transactionspending undo? |
Date | |
Msg-id | CA+TgmobMBgfH=2D7xKa10O_2aw-GUXTwja9OKabTQsm-BZiRyg@mail.gmail.com Whole thread Raw |
In response to | Re: should there be a hard-limit on the number of transactionspending undo? (Andres Freund <andres@anarazel.de>) |
Responses |
Re: should there be a hard-limit on the number of transactionspending undo?
|
List | pgsql-hackers |
On Fri, Jul 19, 2019 at 2:04 PM Andres Freund <andres@anarazel.de> wrote: > It doesn't seem that hard - and kind of required for robustness > independent of the decision around "completeness" - to find a way to use > the locks already held by the prepared transaction. I'm not wild about finding more subtasks to put on the must-do list, but I agree it's doable. > I'm not following, unfortunately. > > I don't understand what exactly the scenario is you refer to. You say > "session N+1 can just stay in the same transaction", but then you also > reference something taking "probably a five digit number of > transactions". Are those transactions the prepared ones? So you begin a bunch of transactions. All but one of them begin a transaction, insert data into a table, and then prepare. The last one begins a transaction and locks the table. Now you roll back all the prepared transactions. Those sessions now begin new transactions, insert data into a second table, and prepare the second set of transactions. The last session, which still has the first table locked, now locks the second table in addition. Now you again roll back all the prepared transactions. At this point you have 2 * max_prepared_transactions that are waiting for undo, all blocked on that last session that holds locks on both tables. So now you go have all of those sessions begin a third transaction, and they all insert into a third table, and prepare. The last session now attempts AEL on that third table, and once it's waiting, you roll back all the prepared transactions, after which that last session successfully picks up its third table lock. You can keep repeating this, locking a new table each time, until you run out of lock table space, by which time you will have roughly max_prepared_transactions * size_of_lock_table transactions waiting for undo processing. > You could force new connections to complete the rollback processing of > the terminated connection, if there's too much pending UNDO. That'd be a > way of providing back-pressure against such crazy scenarios. Seems > again that it'd be good to have that pressure, independent of the > decision on completeness. That would definitely provide a whole lot of back-pressure, but it would also make the system unusable if the undo handler finds a way to FATAL, or just hangs for some stupid reason (stuck I/O?). It would be a shame if the administrative action needed to fix the problem were prevented by the back-pressure mechanism. One thing I've thought about, which I think would be helpful for a variety of scenarios, is to have a facility that forces a computed delay at the each write transaction (when it first writes WAL, or when an XID is assigned), or we could adapt that to this case and say the beginning of each undo-using transaction. So for example if you are about to run out of space in pg_wal, you can slow thinks down to let the checkpoint complete, or if you are about to run out of XIDs, you can slow things down to let autovacuum complete, or if you are about to run out of undo slots, you can slow things down to let some undo to complete. The trick is to make sure that you only wait when it's likely to do some good; if you wait because you're running out of XIDs and the reason you're running out of XIDs is because somebody left a replication slot or a prepared transaction around, the back-pressure is useless. > Couldn't we record the outstanding transactions in the checkpoint, and > then recompute the changes to that record during WAL replay? Hmm, that's not a bad idea. So the transactions would have to "count" the moment they insert their first undo record, which is exactly the right thing anyway. Hmm, but what about transactions that are only touching unlogged tables? > Yea, I think that's what it boils down to... Would be good to have a few > more opinions on this. +1. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: