Re: Row Visibility and Table Access Methods - Mailing list pgsql-hackers
From | Pavel Stehule |
---|---|
Subject | Re: Row Visibility and Table Access Methods |
Date | |
Msg-id | CAFj8pRDqHF8zGgHgh=SqV4uvn8Fn_H=KhFWmRbuoPwHiTv2RVw@mail.gmail.com Whole thread Raw |
In response to | Row Visibility and Table Access Methods (Simon Riggs <simon@2ndquadrant.com>) |
Responses |
Re: Row Visibility and Table Access Methods
|
List | pgsql-hackers |
čt 13. 12. 2018 v 10:23 odesílatel Simon Riggs <simon@2ndquadrant.com> napsal:
Currently, tables provide MVCC access semantics as the only option.A more complete list of desirable semantics in applications are* MVCC - provide latest consistent view* Historical - keep all old row versions so that queries can access data as it used to be* TTL=Duration - keep committed rows around for at most Duration seconds* Latest - show only the most current committed row version, at the cost of inconsistencyThere might be othersSince I see these options as semantics rather than physical, I think we should separate these operations away from Table Access Methods. This allows those semantics to be implemented in different ways for different storage types."Historical" access has been discussed many times, so no need to revisit here. Clearly, it is a very poopular idea, just not easily feasible with the current heap access layer. We might want an option for row_visibility retention. For tables with this option set, we would in later releases allow historical queries according to the SQL Standard."TTL" or "Time To Live" - time-limited access to data is available in many other databases. It is simple to implement and we could easily have this in PG12. Implementation is 1) adding the option, 2) adding a time-component into the visibility check for scan and vacuum. This option implies an option exists to specify row_visibility retention."Latest" is similar to the way EvalPlanQual works, allowing UPDATE to see the latest version of a row before update, and similar to the way catalog scans work in that any access to a catalog entry sees the latest row based upon an immediate snapshot, not the one taken at the start of a query. It makes sense to allow this as an explicit table-level option, so any data viewed can see the latest version, just as UPDATEs already do. This also means that juggling bloat and old row versions becomes much less of an issue for very high volume update applications such as booking systems or stock tickers. (Clearly, better table access methods would improve on this further and even they would benefit from this simplification of the main issue around MVCC).Logic for this type of visibility already exists in PG via HeapTupleSatisfiesSelf(), so we are just exposing what is already there to the user; no need to discuss semantics at length.Again, patches to implement this option are simple and viable for PG12.User interface are 2 new table-level options* row_visibility = MVCC (default), TTL, Latest, Historical* row_visibility_retention_interval = 'system' (default)For MVCC, the only valid setting would be system, i.e. current MVCC behavior (but this might be altered by specific storage plugin parameters)For Latest, the only valid setting would be systemFor TTL, the interval to retain data for, setting of 0 is not validFor Historical, the interval to retain old row versions for, 0 means foreverImplementation summary1. Add new table-level option for row_visibility and row_visibility_retention_interval2. Add option to heap_beginscan3. Add option handling in heap prune4. Add option to tqualThoughts?
looks great
Pavel
--Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: