Thread: [PATCH] SQL assertions prototype
Various places in the constraint checking code say something like, if we ever implement assertions, here is where it should go. I've been fiddling with filling in those gaps for some time now, and the other day I noticed, hey, this actually kind of works, so here it is. Let's see whether this architecture is sound. A constraint trigger performs the actual checking. For the implementation of the trigger, I've used some SPI hacking for now; that could probably be refined. The attached patch has documentation, tests, psql support. Missing pieces are pg_dump support, dependency management, and permission checking (the latter marked in the code). This is not a performance feature. It's for things like, this table should have at most 10 rows, or all the values in this table must be bigger than all the values in that other table. It's a bit esoteric, but it comes up again and again. Let me know what you think.
Attachment
+1
interesting feature2013/11/15 Peter Eisentraut <peter_e@gmx.net>
Various places in the constraint checking code say something like, if we
ever implement assertions, here is where it should go. I've been
fiddling with filling in those gaps for some time now, and the other day
I noticed, hey, this actually kind of works, so here it is. Let's see
whether this architecture is sound.
A constraint trigger performs the actual checking. For the
implementation of the trigger, I've used some SPI hacking for now; that
could probably be refined. The attached patch has documentation, tests,
psql support. Missing pieces are pg_dump support, dependency
management, and permission checking (the latter marked in the code).
This is not a performance feature. It's for things like, this table
should have at most 10 rows, or all the values in this table must be
bigger than all the values in that other table. It's a bit esoteric,
but it comes up again and again.
Let me know what you think.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 15.11.2013 05:30, Peter Eisentraut wrote: > Various places in the constraint checking code say something like, if we > ever implement assertions, here is where it should go. I've been > fiddling with filling in those gaps for some time now, and the other day > I noticed, hey, this actually kind of works, so here it is. Let's see > whether this architecture is sound. Cool! > A constraint trigger performs the actual checking. For the > implementation of the trigger, I've used some SPI hacking for now; that > could probably be refined. The attached patch has documentation, tests, > psql support. Missing pieces are pg_dump support, dependency > management, and permission checking (the latter marked in the code). A fundamental problem with this is that it needs to handle isolation reliable, so that the assertion cannot be violated when two concurrent backends do things. Consider the example from the manual, which checks that a table has at least one row. Now, if the table has two rows to begin with, and in one backend you delete one row, and concurrently in another backend you delete the other row, and then commit both transactions, the assertion is violated. In other words, the assertions need to be checked in serializable mode. Now that we have a real serializable mode, I think that's actually feasible. PS. The patch doesn't check that the assertion holds when it's created: postgres=# create table foo (i int4); CREATE TABLE postgres=# create assertion myassert check ((select count(*) from foo) > 0); CREATE ASSERTION - Heikki
On 15 November 2013 03:30, Peter Eisentraut <peter_e@gmx.net> wrote: > A constraint trigger performs the actual checking. Good, that is consistent with other constraints. > This is not a performance feature. It's for things like, this table > should have at most 10 rows, or all the values in this table must be > bigger than all the values in that other table. It's a bit esoteric, > but it comes up again and again. While I accept it may never perform well, it needs to perform reasonably well. The key use cases for this are * enforcing "one and only one" relationships * enforcing quantified relationships like we do in XML: minoccurs and maxoccurs * enforcing only one sub-type across multiple sub-type tables etc So we'd need to get access to the changed rows, rather than re-executing a huge SQL command that re-checks every row of the table. That last point will make it unusable for sensible amounts of data. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Sun, Nov 24, 2013 at 11:03 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > So we'd need to get access to the changed rows, rather than > re-executing a huge SQL command that re-checks every row of the table. > That last point will make it unusable for sensible amounts of data. That sounds very similar to handling incremental maintenance of materialized views, which Kevin is working on. Let's assume that the "huge SQL command that re-checks every row of the table" is actually a materialized view. In that case, the CREATE ASSERTION trigger would merely need to scan the matview and raise an error if any rows were present. That should be a very quick operation. No need to invent some sort of "get access to the changed rows" mechanism especially for CREATE ASSERTION. Kind regards, Andrew Tipton
Andrew Tipton <andrew@kiwidrew.com> wrote: > Simon Riggs <simon@2ndquadrant.com> wrote: >> So we'd need to get access to the changed rows, rather than >> re-executing a huge SQL command that re-checks every row of the >> table. That last point will make it unusable for sensible >> amounts of data. > > That sounds very similar to handling incremental maintenance of > materialized views, which Kevin is working on. It does. > Let's assume that the "huge SQL command that re-checks every row > of the table" is actually a materialized view. In that case, the > CREATE ASSERTION trigger would merely need to scan the matview > and raise an error if any rows were present. That should be a > very quick operation. That would certainly be a viable way to implement this once we have incremental maintenance for materialized views, although I make no claims to having evaluated it versus the alternatives to be able to assert what the *best* way is. > No need to invent some sort of "get access to the changed > rows" mechanism especially for CREATE ASSERTION. As soon as we are out of this CF, I am planning to write code to capture deltas and fire functions to process them "eagerly" (within the creating transaction). There has been suggestions that the changeset mechanism should be used for that, which I will look into; but my gut feel is that it will be better to build a tuplestore of tids flagged with "old" or "new" around the point that "after triggers" fire. How close does that sound to what CREATE ASSERTION (as currently envisioned) would need? How viable does it sound to turn an assertion expression into a matview which is empty if there are no violations? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Nov 25, 2013 at 11:04:23AM -0800, Kevin Grittner wrote: > Andrew Tipton <andrew@kiwidrew.com> wrote: > > Simon Riggs <simon@2ndquadrant.com> wrote: > >> So we'd need to get access to the changed rows, rather than > >> re-executing a huge SQL command that re-checks every row of the > >> table. That last point will make it unusable for sensible > >> amounts of data. > > > > That sounds very similar to handling incremental maintenance of > > materialized views, which Kevin is working on. > > It does. > > > Let's assume that the "huge SQL command that re-checks every row > > of the table" is actually a materialized view. In that case, the > > CREATE ASSERTION trigger would merely need to scan the matview > > and raise an error if any rows were present. That should be a > > very quick operation. > > That would certainly be a viable way to implement this once we have > incremental maintenance for materialized views, although I make no > claims to having evaluated it versus the alternatives to be able to > assert what the *best* way is. > > > No need to invent some sort of "get access to the changed > > rows" mechanism especially for CREATE ASSERTION. > > As soon as we are out of this CF, I am planning to write code to > capture deltas and fire functions to process them "eagerly" (within > the creating transaction). There has been suggestions that the > changeset mechanism should be used for that, which I will look > into; but my gut feel is that it will be better to build a > tuplestore of tids flagged with "old" or "new" around the point > that "after triggers" fire. How close does that sound to what > CREATE ASSERTION (as currently envisioned) would need? It sounds *extremely* close to what we'd need for row access in per-statement triggers, as in probably identical. The SQL syntax of this sub-feature is described in Foundation section 11.49 and called REFERENCING in CREATE TRIGGER. Do you have any prototypes I could use for that purpose? Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
David Fetter <david@fetter.org> wrote: > On Mon, Nov 25, 2013 at 11:04:23AM -0800, Kevin Grittner wrote: >> As soon as we are out of this CF, I am planning to write code to >> capture deltas and fire functions to process them "eagerly" >> (within the creating transaction). There has been suggestions >> that the changeset mechanism should be used for that, which I >> will look into; but my gut feel is that it will be better to >> build a tuplestore of tids flagged with "old" or "new" around >> the point that "after triggers" fire. How close does that sound >> to what CREATE ASSERTION (as currently envisioned) would need? > > It sounds *extremely* close to what we'd need for row access in > per-statement triggers, as in probably identical. The SQL syntax > of this sub-feature is described in Foundation section 11.49 and > called REFERENCING in CREATE TRIGGER. > > Do you have any prototypes I could use for that purpose? No, but it is at the top of my list after the CF. I will also need an execution node type or two to produce the referenced rows for the appropriate contexts, which is probably also very close to what you need for per-statement triggers. I will be happy to coordinate work with you. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 11/24/13, 10:03 AM, Simon Riggs wrote: > So we'd need to get access to the changed rows, rather than > re-executing a huge SQL command that re-checks every row of the table. > That last point will make it unusable for sensible amounts of data. SQL assertions work with arbitrary expressions. So even if you had the changed rows, you couldn't do anything with them in general. For cases where it makes sense to consider the changed rows, you probably want a per-table trigger or an exclusion constraint or perhaps an auto-updated materialized view. Then again, we have lots of ways to make queries fast even for large tables.
On 11/15/2013 05:41 AM, Heikki Linnakangas wrote: > A fundamental problem with this is that it needs to handle isolation > reliable, so that the assertion cannot be violated when two concurrent > backends do things. Consider the example from the manual, which checks > that a table has at least one row. Now, if the table has two rows to > begin with, and in one backend you delete one row, and concurrently in > another backend you delete the other row, and then commit both > transactions, the assertion is violated. > > In other words, the assertions need to be checked in serializable mode. > Now that we have a real serializable mode, I think that's actually > feasible. Going back over this patch, I haven't seen any further discussion of the point Heikki raises above, which seems like a bit of a showstopper. Heikki, did you have specific ideas on how to solve this? Right now my mind boggles. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Josh Berkus <josh@agliodbs.com> wrote: > On 11/15/2013 05:41 AM, Heikki Linnakangas wrote: >> A fundamental problem with this is that it needs to handle isolation >> reliable, so that the assertion cannot be violated when two concurrent >> backends do things. Consider the example from the manual, which checks >> that a table has at least one row. Now, if the table has two rows to >> begin with, and in one backend you delete one row, and concurrently in >> another backend you delete the other row, and then commit both >> transactions, the assertion is violated. >> >> In other words, the assertions need to be checked in serializable mode. >> Now that we have a real serializable mode, I think that's actually >> feasible. > > Going back over this patch, I haven't seen any further discussion of the > point Heikki raises above, which seems like a bit of a showstopper. > > Heikki, did you have specific ideas on how to solve this? Right now my > mind boggles. It works fine as long as you set default_transaction_isolation = 'serializable' and never override that. :-) Of course, it sure would be nice to have a way to prohibit overrides, but that's another issue. Otherwise it is hard to see how to make it work in a general way without a mutually exclusive lock mode on the table for the duration of any transaction which modifies the table. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 12/17/2013 04:42 PM, Kevin Grittner wrote: > Josh Berkus <josh@agliodbs.com> wrote: >> On 11/15/2013 05:41 AM, Heikki Linnakangas wrote: >>> A fundamental problem with this is that it needs to handle isolation >>> reliable, so that the assertion cannot be violated when two concurrent >>> backends do things. Consider the example from the manual, which checks >>> that a table has at least one row. Now, if the table has two rows to >>> begin with, and in one backend you delete one row, and concurrently in >>> another backend you delete the other row, and then commit both >>> transactions, the assertion is violated. >>> >>> In other words, the assertions need to be checked in serializable mode. >>> Now that we have a real serializable mode, I think that's actually >>> feasible. >> Going back over this patch, I haven't seen any further discussion of the >> point Heikki raises above, which seems like a bit of a showstopper. >> >> Heikki, did you have specific ideas on how to solve this? Right now my >> mind boggles. > It works fine as long as you set default_transaction_isolation = > 'serializable' and never override that. :-) Of course, it sure > would be nice to have a way to prohibit overrides, but that's > another issue. > > Otherwise it is hard to see how to make it work in a general way > without a mutually exclusive lock mode on the table for the > duration of any transaction which modifies the table. > Maybe the presence of an assertion should be enough to force serializable, i.e. turn it on and not allow it to be turned off. cheers andrew > >
On 18/12/13 10:48, Andrew Dunstan wrote: > > On 12/17/2013 04:42 PM, Kevin Grittner wrote: >> Josh Berkus <josh@agliodbs.com> wrote: >>> On 11/15/2013 05:41 AM, Heikki Linnakangas wrote: >>>> A fundamental problem with this is that it needs to handle isolation >>>> reliable, so that the assertion cannot be violated when two concurrent >>>> backends do things. Consider the example from the manual, which checks >>>> that a table has at least one row. Now, if the table has two rows to >>>> begin with, and in one backend you delete one row, and concurrently in >>>> another backend you delete the other row, and then commit both >>>> transactions, the assertion is violated. >>>> >>>> In other words, the assertions need to be checked in serializable >>>> mode. >>>> Now that we have a real serializable mode, I think that's actually >>>> feasible. >>> Going back over this patch, I haven't seen any further discussion of >>> the >>> point Heikki raises above, which seems like a bit of a showstopper. >>> >>> Heikki, did you have specific ideas on how to solve this? Right now my >>> mind boggles. >> It works fine as long as you set default_transaction_isolation = >> 'serializable' and never override that. :-) Of course, it sure >> would be nice to have a way to prohibit overrides, but that's >> another issue. >> >> Otherwise it is hard to see how to make it work in a general way >> without a mutually exclusive lock mode on the table for the >> duration of any transaction which modifies the table. >> > > > > Maybe the presence of an assertion should be enough to force > serializable, i.e. turn it on and not allow it to be turned off. > > cheers > > andrew > > Perhaps then it should be called an 'assurance', rather than an 'assertion?' (Not being entirely facetious!) Cheers, Gavin
On 12/17/2013 01:42 PM, Kevin Grittner wrote: > Josh Berkus <josh@agliodbs.com> wrote: >> Going back over this patch, I haven't seen any further discussion of the >> point Heikki raises above, which seems like a bit of a showstopper. >> >> Heikki, did you have specific ideas on how to solve this? Right now my >> mind boggles. > > It works fine as long as you set default_transaction_isolation = > 'serializable' and never override that. :-) Of course, it sure > would be nice to have a way to prohibit overrides, but that's > another issue. > > Otherwise it is hard to see how to make it work in a general way > without a mutually exclusive lock mode on the table for the > duration of any transaction which modifies the table. Serializable or not, *what* do we lock for assertions? It's not rows. Tables? Which tables? What if the assertion is an interpreted language function? Does the SSI reference counter really take care of all of this? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 12/18/2013 02:59 AM, Josh Berkus wrote: > On 12/17/2013 01:42 PM, Kevin Grittner wrote: >> Josh Berkus <josh@agliodbs.com> wrote: >>> Going back over this patch, I haven't seen any further discussion of the >>> point Heikki raises above, which seems like a bit of a showstopper. >>> >>> Heikki, did you have specific ideas on how to solve this? Right now my >>> mind boggles. >> >> It works fine as long as you set default_transaction_isolation = >> 'serializable' and never override that. :-) Of course, it sure >> would be nice to have a way to prohibit overrides, but that's >> another issue. >> >> Otherwise it is hard to see how to make it work in a general way >> without a mutually exclusive lock mode on the table for the >> duration of any transaction which modifies the table. > > Serializable or not, *what* do we lock for assertions? It's not rows. > Tables? Which tables? What if the assertion is an interpreted language > function? Does the SSI reference counter really take care of all of this? SSI does make everything rosy, as long as all the transactions participate in it. The open questions revolve around what happens if a transaction is not running in SSI mode. If you force everyone to run in SSI as soon as you create even a single assertion in your database, that's kind of crappy for performance. Also, if one user creates an assertion, it becomes a funny kind of a partial denial of service attack to other users, if you force other user's to also run in SSI mode. If you don't force everything to run in SSI mode, then you have to somehow figure out what parts do need to run in SSI mode to enforce the assertion. For example, if the assertion refers tables A and B, perhaps you can get away without predicate locks on table C? - Heikki
On 12/18/2013 02:59 AM, Josh Berkus wrote: > On 12/17/2013 01:42 PM, Kevin Grittner wrote: >> It works fine as long as you set default_transaction_isolation = >> 'serializable' and never override that. :-) Of course, it sure >> would be nice to have a way to prohibit overrides, but that's >> another issue. >> >> Otherwise it is hard to see how to make it work in a general way >> without a mutually exclusive lock mode on the table for the >> duration of any transaction which modifies the table. > > Serializable or not, *what* do we lock for assertions? It's not rows. > Tables? Which tables? What if the assertion is an interpreted language > function? Does the SSI reference counter really take care of all of this? Here's another idea that doesn't involve SSI: At COMMIT, take a new snapshot and check that the assertion still passes with that snapshot. Now, there's a race condition, if another transaction is committing at the same time, and performs the same check concurrently. That race condition can be eliminated by holding an exclusive lock while running the assertion, until commit, effectively allowing the assertion to be checked by only one transaction at a time. I think that would work, and would be simple, although it wouldn't scale too well. - Heikki
On 2013-12-18 13:07:51 +0200, Heikki Linnakangas wrote: > Here's another idea that doesn't involve SSI: > > At COMMIT, take a new snapshot and check that the assertion still passes > with that snapshot. Now, there's a race condition, if another transaction is > committing at the same time, and performs the same check concurrently. That > race condition can be eliminated by holding an exclusive lock while running > the assertion, until commit, effectively allowing the assertion to be > checked by only one transaction at a time. > > I think that would work, and would be simple, although it wouldn't scale too > well. It probably would also be very prone to deadlocks. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 12/18/2013 01:39 PM, Andres Freund wrote: > On 2013-12-18 13:07:51 +0200, Heikki Linnakangas wrote: >> Here's another idea that doesn't involve SSI: >> >> At COMMIT, take a new snapshot and check that the assertion still passes >> with that snapshot. Now, there's a race condition, if another transaction is >> committing at the same time, and performs the same check concurrently. That >> race condition can be eliminated by holding an exclusive lock while running >> the assertion, until commit, effectively allowing the assertion to be >> checked by only one transaction at a time. >> >> I think that would work, and would be simple, although it wouldn't scale too >> well. > > It probably would also be very prone to deadlocks. Hmm, since this would happen at commit, you would know all the assertions that need to be checked at that point. You could check them e.g in Oid order to avoid deadlocks. - Heikki
On 2013-12-18 13:46:59 +0200, Heikki Linnakangas wrote: > On 12/18/2013 01:39 PM, Andres Freund wrote: > >On 2013-12-18 13:07:51 +0200, Heikki Linnakangas wrote: > >>Here's another idea that doesn't involve SSI: > >> > >>At COMMIT, take a new snapshot and check that the assertion still passes > >>with that snapshot. > >>I think that would work, and would be simple, although it wouldn't scale too > >>well. > > > >It probably would also be very prone to deadlocks. > > Hmm, since this would happen at commit, you would know all the assertions > that need to be checked at that point. You could check them e.g in Oid order > to avoid deadlocks. I think real problem are the lock upgrades, because eventual DML will have acquired less heavy locks. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 12/18/2013 01:50 PM, Andres Freund wrote: > On 2013-12-18 13:46:59 +0200, Heikki Linnakangas wrote: >> On 12/18/2013 01:39 PM, Andres Freund wrote: >>> On 2013-12-18 13:07:51 +0200, Heikki Linnakangas wrote: >>>> Here's another idea that doesn't involve SSI: >>>> >>>> At COMMIT, take a new snapshot and check that the assertion still passes >>>> with that snapshot. >>>> I think that would work, and would be simple, although it wouldn't scale too >>>> well. >>> >>> It probably would also be very prone to deadlocks. >> >> Hmm, since this would happen at commit, you would know all the assertions >> that need to be checked at that point. You could check them e.g in Oid order >> to avoid deadlocks. > > I think real problem are the lock upgrades, because eventual DML will > have acquired less heavy locks. Ah, I see. You don't need to block anyone else from modifying the table, you just need to block anyone else from committing a transaction that had modified the table. So the locks shouldn't interfere with regular table locks. A ShareUpdateExclusiveLock on the assertion should do it. - Heikki
On 12/18/2013 06:00 AM, Heikki Linnakangas wrote: > > > If you don't force everything to run in SSI mode, then you have to > somehow figure out what parts do need to run in SSI mode to enforce > the assertion. For example, if the assertion refers tables A and B, > perhaps you can get away without predicate locks on table C? > > But the assertion might simply run a function. For non-trivial cases that's what I would expect people to do. cheers andrew
Heikki Linnakangas <hlinnakangas@vmware.com> wrote: >On 12/18/2013 01:39 PM, Andres Freund wrote: >> On 2013-12-18 13:07:51 +0200, Heikki Linnakangas wrote: >>> Here's another idea that doesn't involve SSI: >>> >>> At COMMIT, take a new snapshot and check that the assertion still passes >>> with that snapshot. Now, there's a race condition, if another transaction is >>> committing at the same time, and performs the same check concurrently. That >>> race condition can be eliminated by holding an exclusive lock while running >>> the assertion, until commit, effectively allowing the assertion to be >>> checked by only one transaction at a time. >>> >>> I think that would work, and would be simple, although it wouldn't scale too >>> well. >> >> It probably would also be very prone to deadlocks. > > Hmm, since this would happen at commit, you would know all the > assertions that need to be checked at that point. You could check them > e.g in Oid order to avoid deadlocks. So you would actually serialize all COMMITs, or at least all which had done anything which could affect the truth of an assertion? That seems like it would work, but I suspect that it would be worse for performance than SSI in many workloads, and better than SSI in other workloads. Maybe a GUC to determine which strategy is used? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Josh Berkus <josh@agliodbs.com> wrote: > Serializable or not, *what* do we lock for assertions? It's not > rows. Tables? Which tables? What if the assertion is an > interpreted language function? Does the SSI reference counter > really take care of all of this? The simple answer is that, without adding any additional blocking, SSI guarantees that the behavior of running any set of concurrent serializable transactions is consistent with some serial (one-at-a-time) execution of those transactions. If the assertion is run as part of the transaction, it is automatically handled, regardless of the issues you are asking about. The longer answer is in the README and the papers it references: http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/storage/lmgr/README-SSI;hb=master For practical examples of how it works, this Wiki page includes examples of maintaining a multi-table invariant using triggers: http://wiki.postgresql.org/wiki/SSI -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Heikki Linnakangas wrote: > Ah, I see. You don't need to block anyone else from modifying the > table, you just need to block anyone else from committing a > transaction that had modified the table. So the locks shouldn't > interfere with regular table locks. A ShareUpdateExclusiveLock on > the assertion should do it. Causing serialization of transaction commit just because a single assertion exists in the database seems too much of a hit, so looking for optimization opportunities seems appropriate. Here are some ideas for brainstorming. It might prove useful to note that any given assertion involves tables A, B, C. If a transaction doesn't modify any of those tables then there's no need to run the assertion when the transaction commits, skipping acquisition of the assertion lock. Probably this can only be implemented for SQL-language assertions, but even so it might be worth considering. (Assertions in any other language would be checked by every transaction.) Another thought: at the initial run of the assertion, note which tables it locked, and record this as an OID array in the catalog row for the assertion; consider running the assertion only when those tables are touched. This doesn't work if the assertion code locks some tables when run under certain conditions and other tables under different conditions. But then this can be checked too: if an assertion lists in its catalog row that it involves tables A, B, C and then, under different conditions, it tries to acquire lock on table D, have the whole thing fail indicating that the assertion is misdeclared. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 12/18/2013 08:44 AM, Alvaro Herrera wrote: > Another thought: at the initial run of the assertion, note which tables > it locked, and record this as an OID array in the catalog row for the > assertion; consider running the assertion only when those tables are > touched. This doesn't work if the assertion code locks some tables when > run under certain conditions and other tables under different > conditions. But then this can be checked too: if an assertion lists in > its catalog row that it involves tables A, B, C and then, under > different conditions, it tries to acquire lock on table D, have the > whole thing fail indicating that the assertion is misdeclared. This sounds like you're re-inventing SSI. SERIALIZABLE mode *exists* in order to be able to enforce constraints which potentially involve more than one transaction. "Balance can never go below 0", for example. The whole reason we have this really cool and unique SSI mode is so that we can do such things without killing performance. These sorts of requirements are ideally suited to Assertions, so it's logically consistent to require Serializable mode in order to use Assertions. I'm leaning towards the alternative that Assertions require SERIALIZABLE mode, and throw a WARNING at the user and the log every time we create, modify, or trigger an assertion while not in SERIALIZABLE mode. And beyond, that, we don't guarantee the integrity of Assertions if people choose to run in READ COMMITTED anyway. This is consistent with how we treat the interaction of constraints and triggers; under some circumstances, we allow triggers to violate CHECK and FK constraints. Alternately, we add a GUC assertion_serializable_mode, which can be "off", "warn" or "error". If it's set to "error", and the user triggers an assertion while in READ COMMITTED mode, an exception occurs. If it's set to "off", then assertions are disabled, in order to deal with buggy assertions. Now, it would be even better if we could prevent users from switching transaction mode, but that's a MUCH bigger and more complicated patch. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 2013-12-18 13:44:15 -0300, Alvaro Herrera wrote: > Heikki Linnakangas wrote: > > > Ah, I see. You don't need to block anyone else from modifying the > > table, you just need to block anyone else from committing a > > transaction that had modified the table. So the locks shouldn't > > interfere with regular table locks. A ShareUpdateExclusiveLock on > > the assertion should do it. > > Causing serialization of transaction commit just because a single > assertion exists in the database seems too much of a hit, so looking for > optimization opportunities seems appropriate. It would only force serialization for transactions that modify tables covered by the assert, that doesn't seem to bad. Anything covered by an assert shoulnd't be modified frequently, otherwise you'll run into major performance problems. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 12/18/13, 11:57 AM, Josh Berkus wrote: > On 12/18/2013 08:44 AM, Alvaro Herrera wrote: >> Another thought: at the initial run of the assertion, note which tables >> it locked, and record this as an OID array in the catalog row for the >> assertion; consider running the assertion only when those tables are >> touched. This doesn't work if the assertion code locks some tables when >> run under certain conditions and other tables under different >> conditions. But then this can be checked too: if an assertion lists in >> its catalog row that it involves tables A, B, C and then, under >> different conditions, it tries to acquire lock on table D, have the >> whole thing fail indicating that the assertion is misdeclared. > > This sounds like you're re-inventing SSI. > > SERIALIZABLE mode *exists* in order to be able to enforce constraints > which potentially involve more than one transaction. "Balance can never > go below 0", for example. The whole reason we have this really cool and > unique SSI mode is so that we can do such things without killing > performance. These sorts of requirements are ideally suited to > Assertions, so it's logically consistent to require Serializable mode in > order to use Assertions. The flip-side is that now you can get serialization failures, and I think there's a ton of software that has no clue howto deal with that. So now you don't get to use assertions at all unless you re-engineer your application (but see below). Now, if Postgres could re-attempt serialization failures, maybe that would become a non-issue... (though, there's probablya lot of dangers in doing that...) > This is consistent with how we treat the interaction of constraints and > triggers; under some circumstances, we allow triggers to violate CHECK > and FK constraints. We do? Under what circumstances? > Alternately, we add a GUC assertion_serializable_mode, which can be > "off", "warn" or "error". If it's set to "error", and the user triggers > an assertion while in READ COMMITTED mode, an exception occurs. If it's > set to "off", then assertions are disabled, in order to deal with buggy > assertions. > > Now, it would be even better if we could prevent users from switching > transaction mode, but that's a MUCH bigger and more complicated patch. Another possibility is to allow for two different types of assertions, one based on SSI and one based on locking. -- Jim C. Nasby, Data Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
On 12/18/13, 10:44 AM, Alvaro Herrera wrote: > It might prove useful to note that any given assertion involves tables > A, B, C. If a transaction doesn't modify any of those tables then > there's no need to run the assertion when the transaction commits, > skipping acquisition of the assertion lock. > > Probably this can only be implemented for SQL-language assertions, but > even so it might be worth considering. (Assertions in any other > language would be checked by every transaction.) This is another case where it would be very useful to restrict what relations a transaction (or in this case, a substransaction)can access. If we had the ability to make that restriction then we could force assertions that aren't plainSQL to explicitly specify what tables the assert is going to hit, and if the assert tries to do something differentthen we throw an error. The ability to restrict object access within a transaction would also benefit VACUUM and possibly the Changeset stuff. Fromhttp://www.postgresql.org/message-id/52ACAAFD.6060509@nasby.net: "This would be useful when you have some tables that see very frequent updates/deletes in a database that also has to supportlong-running transactions that don't hit those tables. You'd explicitly limit the tables your long-running transactionwill touch and that way vacuum can ignore the long-running XID when calculating minimum tuple age for the heavy-hittables." -- Jim C. Nasby, Data Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
Andres Freund wrote: > On 2013-12-18 13:44:15 -0300, Alvaro Herrera wrote: > > Heikki Linnakangas wrote: > > > > > Ah, I see. You don't need to block anyone else from modifying the > > > table, you just need to block anyone else from committing a > > > transaction that had modified the table. So the locks shouldn't > > > interfere with regular table locks. A ShareUpdateExclusiveLock on > > > the assertion should do it. > > > > Causing serialization of transaction commit just because a single > > assertion exists in the database seems too much of a hit, so looking for > > optimization opportunities seems appropriate. > > It would only force serialization for transactions that modify tables > covered by the assert, that doesn't seem to bad. Anything covered by an > assert shoulnd't be modified frequently, otherwise you'll run into major > performance problems. Well, as presented there is no way (for the system) to tell which tables are covered by an assertion, is there? That's my point. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Jim Nasby <jim@nasby.net> wrote: > This is another case where it would be very useful to restrict > what relations a transaction (or in this case, a substransaction) > can access. If we had the ability to make that restriction then > we could force assertions that aren't plain SQL to explicitly > specify what tables the assert is going to hit, and if the assert > tries to do something different then we throw an error. > > The ability to restrict object access within a transaction would > also benefit VACUUM and possibly the Changeset stuff. I'm pretty sure that SSI could also optimize based on that, although there are probably about 10 other optimizations that would be bigger gains before getting to that. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2013-12-18 16:39:58 -0300, Alvaro Herrera wrote: > Andres Freund wrote: > > It would only force serialization for transactions that modify tables > > covered by the assert, that doesn't seem to bad. Anything covered by an > > assert shoulnd't be modified frequently, otherwise you'll run into major > > performance problems. > > Well, as presented there is no way (for the system) to tell which tables > are covered by an assertion, is there? That's my point. Well, the patch's syntax seems to only allow to directly specify a SQL query to check - we could iterate over the querytree to gather all related tables and reject any function we do not understand. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 12/18/13, 1:42 PM, Kevin Grittner wrote: > Jim Nasby <jim@nasby.net> wrote: > >> This is another case where it would be very useful to restrict >> what relations a transaction (or in this case, a substransaction) >> can access. If we had the ability to make that restriction then >> we could force assertions that aren't plain SQL to explicitly >> specify what tables the assert is going to hit, and if the assert >> tries to do something different then we throw an error. >> >> The ability to restrict object access within a transaction would >> also benefit VACUUM and possibly the Changeset stuff. > > I'm pretty sure that SSI could also optimize based on that, > although there are probably about 10 other optimizations that would > be bigger gains before getting to that. Any ideas how hard this would be? My thought is that we might be able to perform this check in the functions that do cataloglookups, but I'm worried that that wouldn't allow us to support subtransaction checks (which we'd need for assertions),and it runs the risk of long-lasting object references spanning the transaction (or subtransaction) and therebythwarting the check. Another option would be in heap accessor functions, but that means we could only restrict access to tables. For assertions,it would be nice to also disallow access to functions that could have unintended consequences that could violatethe assertion (like dblink). -- Jim C. Nasby, Data Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
Jim Nasby <jim@nasby.net> wrote: > On 12/18/13, 1:42 PM, Kevin Grittner wrote: >> Jim Nasby <jim@nasby.net> wrote: >> >>> This is another case where it would be very useful to restrict >>> what relations a transaction (or in this case, a substransaction) >>> can access. If we had the ability to make that restriction then >>> we could force assertions that aren't plain SQL to explicitly >>> specify what tables the assert is going to hit, and if the assert >>> tries to do something different then we throw an error. >>> >>> The ability to restrict object access within a transaction would >>> also benefit VACUUM and possibly the Changeset stuff. >> >> I'm pretty sure that SSI could also optimize based on that, >> although there are probably about 10 other optimizations that would >> be bigger gains before getting to that. > > Any ideas how hard this would be? If we had a list to check against, I think it would be possible to do this during parse analysis and AcquireRewriteLocks(). (One or the other happens before query rewrite.) The hard part seems to me to be defining a sane way to get the list. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 12/18/2013 02:45 PM, Andres Freund wrote: > On 2013-12-18 16:39:58 -0300, Alvaro Herrera wrote: >> Andres Freund wrote: >>> It would only force serialization for transactions that modify tables >>> covered by the assert, that doesn't seem to bad. Anything covered by an >>> assert shoulnd't be modified frequently, otherwise you'll run into major >>> performance problems. >> Well, as presented there is no way (for the system) to tell which tables >> are covered by an assertion, is there? That's my point. > Well, the patch's syntax seems to only allow to directly specify a SQL > query to check - we could iterate over the querytree to gather all > related tables and reject any function we do not understand. > Umm, that's really a major limitation in utility. We need to come up with a better answer than this, which would essentially hobble the facility. cheers andrew
On 12/18/2013 11:04 PM, Andrew Dunstan wrote: > > On 12/18/2013 02:45 PM, Andres Freund wrote: >> On 2013-12-18 16:39:58 -0300, Alvaro Herrera wrote: >>> Andres Freund wrote: >>>> It would only force serialization for transactions that modify tables >>>> covered by the assert, that doesn't seem to bad. Anything covered by an >>>> assert shoulnd't be modified frequently, otherwise you'll run into >>>> major >>>> performance problems. >>> Well, as presented there is no way (for the system) to tell which tables >>> are covered by an assertion, is there? That's my point. >> Well, the patch's syntax seems to only allow to directly specify a SQL >> query to check - we could iterate over the querytree to gather all >> related tables and reject any function we do not understand. > > Umm, that's really a major limitation in utility. The query can be "SELECT is_my_assertion_true()", and the function can do anything. - Heikki
On 12/18/2013 04:09 PM, Heikki Linnakangas wrote: > On 12/18/2013 11:04 PM, Andrew Dunstan wrote: >> >> On 12/18/2013 02:45 PM, Andres Freund wrote: >>> On 2013-12-18 16:39:58 -0300, Alvaro Herrera wrote: >>>> Andres Freund wrote: >>>>> It would only force serialization for transactions that modify tables >>>>> covered by the assert, that doesn't seem to bad. Anything covered >>>>> by an >>>>> assert shoulnd't be modified frequently, otherwise you'll run into >>>>> major >>>>> performance problems. >>>> Well, as presented there is no way (for the system) to tell which >>>> tables >>>> are covered by an assertion, is there? That's my point. >>> Well, the patch's syntax seems to only allow to directly specify a SQL >>> query to check - we could iterate over the querytree to gather all >>> related tables and reject any function we do not understand. >> >> Umm, that's really a major limitation in utility. > > The query can be "SELECT is_my_assertion_true()", and the function can > do anything. > > OK, but isn't that what Andres is suggesting we reject? cheers andrew
On 12/18/2013 11:26 AM, Jim Nasby wrote: > The flip-side is that now you can get serialization failures, and I > think there's a ton of software that has no clue how to deal with that. > So now you don't get to use assertions at all unless you re-engineer > your application (but see below). Well, the software will need to deal with an Assertion failure, which I doubt it's prepared to do right now either. >> This is consistent with how we treat the interaction of constraints and >> triggers; under some circumstances, we allow triggers to violate CHECK >> and FK constraints. > > We do? Under what circumstances? AFTER triggers are allowed to ignore constraints sometimes. For example, if you have a tree table with an FK to other rows in the same table, and you have an AFTER trigger on it, the AFTER trigger is allowed to violate the self-FK. That's the one I ran across, but I vaguely remember other cases, and there's some documentation on this in the order of application of triggers in the main docs. > Another possibility is to allow for two different types of assertions, > one based on SSI and one based on locking. The locking version would have to pretty much lock on a table basis (or even a whole-database basis) every time an assertion executed, no? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Josh Berkus <josh@agliodbs.com> wrote: > On 12/18/2013 11:26 AM, Jim Nasby wrote: >> Another possibility is to allow for two different types of >> assertions, one based on SSI and one based on locking. > > The locking version would have to pretty much lock on a table > basis (or even a whole-database basis) every time an assertion > executed, no? As far as I can see, if SSI is *not* used, there needs to be a mutually exclusive lock taken from somewhere inside the COMMIT code until the transaction is complete -- effectively serializing assertion processing for transactions which could affect a given assertion. Locking on tables would, as previously suggested, be very prone to deadlocks on the heavyweight locks. Locking on the assertions in a predictable order seems more promising, especially if there could be some way to only do that if the transaction really might have done something which could affect the truth of the assertion. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Dec18, 2013, at 20:39 , Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Andres Freund wrote: >> On 2013-12-18 13:44:15 -0300, Alvaro Herrera wrote: >>> Heikki Linnakangas wrote: >>> >>>> Ah, I see. You don't need to block anyone else from modifying the >>>> table, you just need to block anyone else from committing a >>>> transaction that had modified the table. So the locks shouldn't >>>> interfere with regular table locks. A ShareUpdateExclusiveLock on >>>> the assertion should do it. >>> >>> Causing serialization of transaction commit just because a single >>> assertion exists in the database seems too much of a hit, so looking for >>> optimization opportunities seems appropriate. >> >> It would only force serialization for transactions that modify tables >> covered by the assert, that doesn't seem to bad. Anything covered by an >> assert shoulnd't be modified frequently, otherwise you'll run into major >> performance problems. > > Well, as presented there is no way (for the system) to tell which tables > are covered by an assertion, is there? That's my point. Well, we *do* know that after executing the assertion, since we know (or at least can track) which tables the assertion touches. I wonder if we couldn't lazily enable SERIALIZED semantics for those tables only, and do so while we evaluate the assertion. So, before evaluating the assertion, we would change the isolation level to SERIALIZABLE. We'd then have to make sure that we detect any conflicts which we would have detected had the isolation level been SERIALIZABLE all along *and* which somehow involve the assertion. Simply changing the isolation level should suffice to detect cases where we read data modified by concurrent transactions. To also detect cases where we write data read by concurrent transactions, we'd have to watch for tuples which were modified by our own transaction. For these tuples, we'd have to verify do what we would have done had we already been in SERIALIZABLE mode when the modification occurred. That means checking for SIREAD locks taken by other transactions, on the tuple and all relevant index pages (plus all corresponding coarser-grained entities like the tuples's page, the table, …). best regards, Florian Pflug
On Mon, Nov 25, 2013 at 12:10:22PM -0800, Kevin Grittner wrote: > David Fetter <david@fetter.org> wrote: > > On Mon, Nov 25, 2013 at 11:04:23AM -0800, Kevin Grittner wrote: > > >> As soon as we are out of this CF, I am planning to write code to > >> capture deltas and fire functions to process them "eagerly" > >> (within the creating transaction). There has been suggestions > >> that the changeset mechanism should be used for that, which I > >> will look into; but my gut feel is that it will be better to > >> build a tuplestore of tids flagged with "old" or "new" around > >> the point that "after triggers" fire. How close does that sound > >> to what CREATE ASSERTION (as currently envisioned) would need? > > > > It sounds *extremely* close to what we'd need for row access in > > per-statement triggers, as in probably identical. The SQL syntax > > of this sub-feature is described in Foundation section 11.49 and > > called REFERENCING in CREATE TRIGGER. > > > > Do you have any prototypes I could use for that purpose? > > No, but it is at the top of my list after the CF. I will also need > an execution node type or two to produce the referenced rows for > the appropriate contexts, which is probably also very close to what > you need for per-statement triggers. I will be happy to coordinate > work with you. Now that the CF in question is over, how can I help? Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On 12/18/13, 2:22 PM, Andres Freund wrote: > It would only force serialization for transactions that modify tables > covered by the assert, that doesn't seem to bad. Anything covered by an > assert shoulnd't be modified frequently, otherwise you'll run into major > performance problems. I think that makes sense. If you want to use assertions, you need to run in serializable mode, otherwise you get an error if you modify anything covered by an assertion. In the future, someone could enhance this for other isolation levels, but as Josh has pointed out, that would likely just be reimplementing SSI with big locks.
On 12/18/13, 2:45 PM, Andres Freund wrote: > Well, the patch's syntax seems to only allow to directly specify a SQL > query to check - we could iterate over the querytree to gather all > related tables and reject any function we do not understand. Creating an assertion scans the query tree and attaches constraint triggers to all tables covered by the assertion condition. So it does essentially work the way you say.
On 12/18/13, 4:04 PM, Andrew Dunstan wrote: > Umm, that's really a major limitation in utility. We need to come up > with a better answer than this, which would essentially hobble the > facility. We don't have any facility to run a trigger on just any command, it needs to be triggered by a table. Of course, that's where your on commit even triggers come in, presumably. But for example, constraint triggers support deferrability, which an on commit trigger wouldn't. We'd just need DML triggers on any/no tables.
Peter Eisentraut <peter_e@gmx.net> schrieb: >On 12/18/13, 2:22 PM, Andres Freund wrote: >> It would only force serialization for transactions that modify tables >> covered by the assert, that doesn't seem to bad. Anything covered by >an >> assert shoulnd't be modified frequently, otherwise you'll run into >major >> performance problems. > >I think that makes sense. If you want to use assertions, you need to >run in serializable mode, otherwise you get an error if you modify >anything covered by an assertion. > >In the future, someone could enhance this for other isolation levels, >but as Josh has pointed out, that would likely just be reimplementing >SSI with big locks. SSI only actually works correctly if all transactions use SSI... I am not sure if we can guarantee that the subset we'd require'dbe safe without the read sie using SSI. Andres -- Please excuse brevity and formatting - I am writing this on my mobile phone. Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> wrote: > Peter Eisentraut <peter_e@gmx.net> schrieb: >> On 12/18/13, 2:22 PM, Andres Freund wrote: >>> It would only force serialization for transactions that modify >>> tables covered by the assert, that doesn't seem to bad. >>> Anything covered by an assert shoulnd't be modified frequently, >>> otherwise you'll run into major performance problems. >> >> I think that makes sense. If you want to use assertions, you >> need to run in serializable mode, otherwise you get an error if >> you modify anything covered by an assertion. >> >> In the future, someone could enhance this for other isolation >> levels, but as Josh has pointed out, that would likely just be >> reimplementing SSI with big locks. > > SSI only actually works correctly if all transactions use SSI... > I am not sure if we can guarantee that the subset we'd require'd > be safe without the read sie using SSI. You could definitely see a state which would not be consistent with getting to some later state under procedural business rules; however, I don't think any connection could ever see a state which violated the constraint as of the moment it was viewed. For examples of essentially enforcing multi-table constraints using triggers and SSI see this section: http://wiki.postgresql.org/wiki/SSI#Enforcing_Business_Rules_in_Triggers For an example of how things can look OK in terms of enforced constraints as of different moments in time, yet those moments in time could be inconsistent, see this section: http://wiki.postgresql.org/wiki/SSI#Enforcing_Business_Rules_in_Triggers SSI gives you a guarantee that with any set of concurrently running transactions, the effect is the same as some serial (one-at-a-time) execution of those transactions; but it says little about the mix of serializable and non-serializable transactions. Non-serializable transactions will, after the last of those serializable transactions has committed or rolled back, see a state which is consistent with some serial execution of those serializable transactions which committed, but it will not necessarily be consistent with them having run in any *particular* order. NOTE: the state might be consistent with some order other than commit order. This means that a non-serializable transaction running in the midst of those serializable transaction commits might see the work of some transaction which will appear to all serializable transactions as having been run *later* while not yet seeing the work of a transaction which will appear to all serializable transactions to have run *earlier*. I'm pretty sure that this means that an invariant, if it is an expression which must always hold for any view of the database, can be enforced by requiring modifying transactions to be serializable. What it doesn't guarantee is that business rules about *transitions* can be enforced without requiring all *transactions* to be serializable. In the Deposit Report example, note that a non-serializable transaction would never be able to see a receipt with a deposit number that was not open; but it *would* be able to see a closed batch header with a set of receipts which was not yet complete. So I think the answer is that the suggested approach is sufficient for enforcing assertions about "static" database state. If you want to make sure that nobody sees a state for which a given expression is false, it is sufficient. Just don't overestimate what that means. You can't ensure that a non-serializable transaction won't see a state which is inconsistent with a later database state according to *procedural* business rules. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Kevin Grittner <kgrittn@ymail.com> wrote: > For an example of how things can look OK in terms of enforced > constraints as of different moments in time, yet those moments in > time could be inconsistent, see this section: > > http://wiki.postgresql.org/wiki/SSI#Enforcing_Business_Rules_in_Triggers Copy/paste error. I meant this link: http://wiki.postgresql.org/wiki/SSI#Read_Only_Transactions -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company