Thread: Re: Incremental View Maintenance, take 2
On Thu, 1 Jun 2023 23:59:09 +0900 Yugo NAGATA <nagata@sraoss.co.jp> wrote: > Hello hackers, > > Here's a rebased version of the patch-set adding Incremental View > Maintenance support for PostgreSQL. That was discussed in [1]. > [1] https://www.postgresql.org/message-id/flat/20181227215726.4d166b4874f8983a641123f5%40sraoss.co.jp --------------------------------------------------------------------------------------- * Overview Incremental View Maintenance (IVM) is a way to make materialized views up-to-date by computing only incremental changes and applying them on views. IVM is more efficient than REFRESH MATERIALIZED VIEW when only small parts of the view are changed. ** Feature The attached patchset provides a feature that allows materialized views to be updated automatically and incrementally just after a underlying table is modified. You can create an incementally maintainable materialized view (IMMV) by using CREATE INCREMENTAL MATERIALIZED VIEW command. The followings are supported in view definition queries: - SELECT ... FROM ... WHERE ..., joins (inner joins, self-joins) - some built-in aggregate functions (count, sum, avg, min, max) - GROUP BY clause - DISTINCT clause Views can contain multiple tuples with the same content (duplicate tuples). ** Restriction The following are not supported in a view definition: - Outer joins - Aggregates otehr than above, window functions, HAVING - Sub-queries, CTEs - Set operations (UNION, INTERSECT, EXCEPT) - DISTINCT ON, ORDER BY, LIMIT, OFFSET Also, a view definition query cannot contain other views, materialized views, foreign tables, partitioned tables, partitions, VALUES, non-immutable functions, system columns, or expressions that contains aggregates. --------------------------------------------------------------------------------------- * Design An IMMV is maintained using statement-level AFTER triggers. When an IMMV is created, triggers are automatically created on all base tables contained in the view definition query. When a table is modified, changes that occurred in the table are extracted as transition tables in the AFTER triggers. Then, changes that will occur in the view are calculated by a rewritten view dequery in which the modified table is replaced with the transition table. For example, if the view is defined as "SELECT * FROM R, S", and tuples inserted into R are stored in a transiton table dR, the tuples that will be inserted into the view are calculated as the result of "SELECT * FROM dR, S". ** Multiple Tables Modification Multiple tables can be modified in a statement when using triggers, foreign key constraint, or modifying CTEs. When multiple tables are modified, we need the state of tables before the modification. For example, when some tuples, dR and dS, are inserted into R and S respectively, the tuples that will be inserted into the view are calculated by the following two queries: "SELECT * FROM dR, S_pre" "SELECT * FROM R, dS" where S_pre is the table before the modification, R is the current state of table, that is, after the modification. This pre-update states of table is calculated by filtering inserted tuples and appending deleted tuples. The subquery that represents pre-update state is generated in get_prestate_rte(). Specifically, the insterted tuples are filtered by calling IVM_visible_in_prestate() in WHERE clause. This function checks the visibility of tuples by using the snapshot taken before table modification. The deleted tuples are contained in the old transition table, and this table is appended using UNION ALL. Transition tables for each modification are collected in each AFTER trigger function call. Then, the view maintenance is performed in the last call of the trigger. In the original PostgreSQL, tuplestores of transition tables are freed at the end of each nested query. However, their lifespan needs to be prolonged to the end of the out-most query in order to maintain the view in the last AFTER trigger. For this purpose, SetTransitionTablePreserved is added in trigger.c. ** Duplicate Tulpes When calculating changes that will occur in the view (= delta tables), multiplicity of tuples are calculated by using count(*). When deleting tuples from the view, tuples to be deleted are identified by joining the delta table with the view, and tuples are deleted as many as specified multiplicity by numbered using row_number() function. This is implemented in apply_old_delta(). When inserting tuples into the view, each tuple is duplicated to the specified multiplicity using generate_series() function. This is implemented in apply_new_delta(). ** DISTINCT clause When DISTINCT is used, the view has a hidden column __ivm_count__ that stores multiplicity for tuples. When tuples are deleted from or inserted into the view, the values of __ivm_count__ column is decreased or increased as many as specified multiplicity. Eventually, when the values becomes zero, the corresponding tuple is deleted from the view. This is implemented in apply_old_delta_with_count() and apply_new_delta_with_count(). ** Aggregates Built-in count sum, avg, min, and max are supported. Whether a given aggregate function can be used or not is checked by using its OID in check_aggregate_supports_ivm(). When creating a materialized view containing aggregates, in addition to __ivm_count__, more than one hidden columns for each aggregate are added to the target list. For example, columns for storing sum(x), count(x) are added if we have avg(x). When the view is maintained, aggregated values are updated using these hidden columns, also hidden columns are updated at the same time. The maintenance of aggregated view is performed in apply_old_delta_with_count() and apply_new_delta_with_count(). The SET clauses for updating columns are generated by append_set_clause_*(). If the view has min(x) or max(x) and the minimum or maximal value is deleted from a table, we need to update the value to the new min/max recalculated from the tables rather than incremental computation. This is performed in recalc_and_set_values(). --------------------------------------------------------------------------------------- * Details of the patch-set (v28) > The patch-set consists of the following eleven patches. In the previous version, the number of patches were nine. In the latest patch-set, the patches are divided more finely aiming to make the review easier. > - 0001: Add a syntax to create Incrementally Maintainable Materialized Views The prposed syntax to create an incrementally maintainable materialized view (IMMV) is; CREATE INCREMENTAL MATERIALIZED VIEW AS SELECT .....; However, this syntax is tentative, so any suggestions are welcomed. > - 0002: Add relisivm column to pg_class system catalog We add a new field in pg_class to indicate a relation is IMMV. Another alternative is to add a new catalog for managing materialized views including IMMV, but I am not sure if we want this. > - 0003: Allow to prolong life span of transition tables until transaction end This patch fixes the trigger system to allow to prolong lifespan of tuple stores for transition tables until the transaction end. We need this because multiple transition tables have to be preserved until the end of the out-most query when multiple tables are modified by nested triggers. (as explained above in Design - Multiple Tables Modification) If we don't want to change the trigger system in such way, the alternative is to copy the contents of transition tables to other tuplestores, although it needs more time and memory. > - 0004: Add Incremental View Maintenance support to pg_dump This patch enables pg_dump to output IMMV using the new syntax. > - 0005: Add Incremental View Maintenance support to psql This patch implements tab-completion for the new syntax and adds information of IMMV to \d meta-command results. > - 0006: Add Incremental View Maintenance support This patch implements the basic IVM feature. DISTINCT and aggregate are not supported here. When an IMMV is created, the view query is checked, and if any non-supported feature is used, it raises an error. If it is ok, triggers are created on base tables and an unique index is created on the view if possible. In BEFORE trigger, an entry is created for each IMMV and the number of trigger firing is counted. Also, the snapshot just before the table modification is stored. In AFTER triggers, each transition tables are preserved. The number of trigger firing is counted also here, and when the firing number of BEFORE and AFTER trigger reach the same, it is deemed the final AFTER trigger call. In the final AFTER trigger, the IMMV is maintained. Rewritten view query is executed to generate delta tables, and deltas are applied to the view. If multiple tables are modified simultaneously, this process is iterated for each modified table. Tables before processed are represented in "pre-update-state", processed tables are "post-update-state" in the rewritten query. > - 0007: Add DISTINCT support for IVM This patch adds DISTINCT clause support. When an IMMV including DISTINCT is created, a hidden column "__ivm_count__" is added to the target list. This column has the number of duplicity of the same tuples. The duplicity is calculated by adding "count(*)" and GROUP BY to the view query. When an IMMV is maintained, the duplicity in __ivm_count__ is updated, and a tuples whose duplicity becomes zero can be deleted from the view. This logic is implemented by SQL in apply_old_delta_with_count and apply_new_delta_with_count. Columns starting with "__ivm_" are deemed hidden columns that doesn't appear when a view is accessed by "SELECT * FROM ....". This is implemented by fixing parse_relation.c. > - 0008: Add aggregates support in IVM This patch provides codes for aggregates support, specifically for builtin count, sum, and avg. When an IMMV containing an aggregate is created, it is checked if this aggregate function is supported, and if it is ok, some hidden columns are added to the target list. When the IMMV is maintained, the aggregated value is updated as well as related hidden columns. The way of update depends the type of aggregate functions, and SET clause string is generated for each aggregate. > - 0009: Add support for min/max aggregates for IVM This patch adds min/max aggregates support. This is separated from #0008 because min/max needs more complicated work than count, sum, and avg. If the view has min(x) or max(x) and the minimum or maximal value is deleted from a table, we need to update the value to the new min/max recalculated from the tables rather than incremental computation. This is performed in recalc_and_set_values(). TIDs and keys of tuples that need re-calculation are returned as a result of the query that deleted min/max values from the view using RETURNING clause. The plan to recalculate and set the new min/max value are stored and reused. > - 0010: regression tests This patch provides regression tests for IVM. > - 0011: documentation This patch provides documantation for IVM. --------------------------------------------------------------------------------------- * Changes from the Previous Version (v27) - Allow TRUNCATE on base tables When a base table is truncated, the view content will be empty if the view definition query does not contain an aggregate without a GROUP clause. Therefore, such views can be truncated. Aggregate views without a GROUP clause always have one row. Therefore, if a base table is truncated, the view will not be empty and will contain a row with NULL value (or 0 for count()). So, in this case, we refresh the view instead of truncating it. - Fix bugs reported by huyajun [1] [1] https://www.postgresql.org/message-id/tencent_FCAF11BCA5003FD16BDDFDDA5D6A19587809%40qq.com --------------------------------------------------------------------------------------- * Discussion ** Aggregate support There were a few suggestions that general aggregate functions should be supported [2][3], which may be possible by extending pg_aggregate catalog. However, we decided to leave supporting general aggregates to the future work [4] because it would need substantial works and make the patch more complex and bigger. There has been no opposite opinion on this. However, if we need more discussion on the design of aggregate support, we can omit aggregate support for the first release of IVM. [2] https://www.postgresql.org/message-id/20191128140333.GA25947%40alvherre.pgsql [3] https://www.postgresql.org/message-id/CAM-w4HOvDrL4ou6m%3D592zUiKGVzTcOpNj-d_cJqzL00fdsS5kg%40mail.gmail.com [4] https://www.postgresql.org/message-id/20201016193034.9a4c44c79fc1eca7babe093e%40sraoss.co.jp ** Hidden columns In order to support DISTINCT or aggregates, our implementation uses hidden columns. Columns starting with "__ivm_" are hidden columns that doesn't appear when a view is accessed by "SELECT * FROM ....". For this aim, parse_relation.c is fixed. There was a proposal to enable hidden columns by adding a new flag to pg_attribute [5], but this thread is no longer active, so we decided to check the hidden column by its name [6]. [5] https://www.postgresql.org/message-id/flat/CAEepm%3D3ZHh%3Dp0nEEnVbs1Dig_UShPzHUcMNAqvDQUgYgcDo-pA%40mail.gmail.com [6] https://www.postgresql.org/message-id/20201016193034.9a4c44c79fc1eca7babe093e%40sraoss.co.jp ** Concurrent Transactions When the view definition has more than one table, we acquire an exclusive lock before the view maintenance in order to avoid inconsistent results. This behavior was explained in [7]. The lock was improved to use weaker lock when the view has only one table based on a suggestion from Konstantin Knizhnik [8]. However, due to the implementation that uses ctid for identifying target tuples, we still have to use an exclusive lock for DELETE and UPDATE. [7] https://www.postgresql.org/message-id/20200909092752.c91758a1bec3479668e82643%40sraoss.co.jp [8] https://www.postgresql.org/message-id/5663f5f0-48af-686c-bf3c-62d279567e2a%40postgrespro.ru ** Automatic Index Creation When a view is created, a unique index is automatically created if possible, that is, if the view definition query has a GROUP BY or DISTINCT, or if the view contains all primary key attributes of its base tables in the target list. It is necessary for efficient view maintenance. This feature is based on a suggestion from Konstantin Knizhnik [9]. [9] https://www.postgresql.org/message-id/89729da8-9042-7ea0-95af-e415df6da14d%40postgrespro.ru ** Trigger and Transition Tables We implemented IVM based on triggers. This is because we want to use transition tables to extract changes on base tables. Also, there are other constraint that are using triggers in its implementation, like foreign references. However, if we can use transition table like feature without relying triggers, we don't have to insist to use triggers and we might implement IVM in the executor directly as similar as declarative partitioning. ** Feature to be Supported in the First Release The current patch-set supports DISTINCT and aggregates for built-in count, sum, avg, min and max. Do we need all these feature for the first IVM release? Supporting DISTINCT and aggregates needs discussion on hidden columns, and for supporting min/max we need to discuss on re-calculation method. Before handling such relatively advanced feature, maybe, should we focus to design and implement of the basic feature of IVM? Any suggestion and discussion are welcomed! Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
Hello hackers, Here's a rebased version of the patch-set adding Incremental View Maintenance support for PostgreSQL. That was discussed in [1]. The patch-set consists of the following eleven patches. - 0001: Add a syntax to create Incrementally Maintainable Materialized Views - 0002: Add relisivm column to pg_class system catalog - 0003: Allow to prolong life span of transition tables until transaction end - 0004: Add Incremental View Maintenance support to pg_dum - 0005: Add Incremental View Maintenance support to psql - 0006: Add Incremental View Maintenance support - 0007: Add DISTINCT support for IVM - 0008: Add aggregates support in IVM - 0009: Add support for min/max aggregates for IVM - 0010: regression tests - 0011: documentation [1] https://www.postgresql.org/message-id/flat/20181227215726.4d166b4874f8983a641123f5%40sraoss.co.jp Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
Attachment
- v28-0001-Add-a-syntax-to-create-Incrementally-Maintainabl.patch
- v28-0002-Add-relisivm-column-to-pg_class-system-catalog.patch
- v28-0003-Allow-to-prolong-life-span-of-transition-tables-.patch
- v28-0004-Add-Incremental-View-Maintenance-support-to-pg_d.patch
- v28-0005-Add-Incremental-View-Maintenance-support-to-psql.patch
- v28-0006-Add-Incremental-View-Maintenance-support.patch
- v28-0007-Add-DISTINCT-support-for-IVM.patch
- v28-0008-Add-aggregates-support-in-IVM.patch
- v28-0009-Add-support-for-min-max-aggregates-for-IVM.patch
- v28-0010-Add-regression-tests-for-Incremental-View-Mainte.patch
- v28-0011-Add-documentations-about-Incremental-View-Mainte.patch
On Thu, Jun 1, 2023 at 2:47 AM Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > On Thu, 1 Jun 2023 23:59:09 +0900 > Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > > Hello hackers, > > > > Here's a rebased version of the patch-set adding Incremental View > > Maintenance support for PostgreSQL. That was discussed in [1]. > > > [1] https://www.postgresql.org/message-id/flat/20181227215726.4d166b4874f8983a641123f5%40sraoss.co.jp > > --------------------------------------------------------------------------------------- > * Overview > > Incremental View Maintenance (IVM) is a way to make materialized views > up-to-date by computing only incremental changes and applying them on > views. IVM is more efficient than REFRESH MATERIALIZED VIEW when > only small parts of the view are changed. > > ** Feature > > The attached patchset provides a feature that allows materialized views > to be updated automatically and incrementally just after a underlying > table is modified. > > You can create an incementally maintainable materialized view (IMMV) > by using CREATE INCREMENTAL MATERIALIZED VIEW command. > > The followings are supported in view definition queries: > - SELECT ... FROM ... WHERE ..., joins (inner joins, self-joins) > - some built-in aggregate functions (count, sum, avg, min, max) > - GROUP BY clause > - DISTINCT clause > > Views can contain multiple tuples with the same content (duplicate tuples). > > ** Restriction > > The following are not supported in a view definition: > - Outer joins > - Aggregates otehr than above, window functions, HAVING > - Sub-queries, CTEs > - Set operations (UNION, INTERSECT, EXCEPT) > - DISTINCT ON, ORDER BY, LIMIT, OFFSET > > Also, a view definition query cannot contain other views, materialized views, > foreign tables, partitioned tables, partitions, VALUES, non-immutable functions, > system columns, or expressions that contains aggregates. > > --------------------------------------------------------------------------------------- > * Design > > An IMMV is maintained using statement-level AFTER triggers. > When an IMMV is created, triggers are automatically created on all base > tables contained in the view definition query. > > When a table is modified, changes that occurred in the table are extracted > as transition tables in the AFTER triggers. Then, changes that will occur in > the view are calculated by a rewritten view dequery in which the modified table > is replaced with the transition table. > > For example, if the view is defined as "SELECT * FROM R, S", and tuples inserted > into R are stored in a transiton table dR, the tuples that will be inserted into > the view are calculated as the result of "SELECT * FROM dR, S". > > ** Multiple Tables Modification > > Multiple tables can be modified in a statement when using triggers, foreign key > constraint, or modifying CTEs. When multiple tables are modified, we need > the state of tables before the modification. > > For example, when some tuples, dR and dS, are inserted into R and S respectively, > the tuples that will be inserted into the view are calculated by the following > two queries: > > "SELECT * FROM dR, S_pre" > "SELECT * FROM R, dS" > > where S_pre is the table before the modification, R is the current state of > table, that is, after the modification. This pre-update states of table > is calculated by filtering inserted tuples and appending deleted tuples. > The subquery that represents pre-update state is generated in get_prestate_rte(). > Specifically, the insterted tuples are filtered by calling IVM_visible_in_prestate() > in WHERE clause. This function checks the visibility of tuples by using > the snapshot taken before table modification. The deleted tuples are contained > in the old transition table, and this table is appended using UNION ALL. > > Transition tables for each modification are collected in each AFTER trigger > function call. Then, the view maintenance is performed in the last call of > the trigger. > > In the original PostgreSQL, tuplestores of transition tables are freed at the > end of each nested query. However, their lifespan needs to be prolonged to > the end of the out-most query in order to maintain the view in the last AFTER > trigger. For this purpose, SetTransitionTablePreserved is added in trigger.c. > > ** Duplicate Tulpes > > When calculating changes that will occur in the view (= delta tables), > multiplicity of tuples are calculated by using count(*). > > When deleting tuples from the view, tuples to be deleted are identified by > joining the delta table with the view, and tuples are deleted as many as > specified multiplicity by numbered using row_number() function. > This is implemented in apply_old_delta(). > > When inserting tuples into the view, each tuple is duplicated to the > specified multiplicity using generate_series() function. This is implemented > in apply_new_delta(). > > ** DISTINCT clause > > When DISTINCT is used, the view has a hidden column __ivm_count__ that > stores multiplicity for tuples. When tuples are deleted from or inserted into > the view, the values of __ivm_count__ column is decreased or increased as many > as specified multiplicity. Eventually, when the values becomes zero, the > corresponding tuple is deleted from the view. This is implemented in > apply_old_delta_with_count() and apply_new_delta_with_count(). > > ** Aggregates > > Built-in count sum, avg, min, and max are supported. Whether a given > aggregate function can be used or not is checked by using its OID in > check_aggregate_supports_ivm(). > > When creating a materialized view containing aggregates, in addition > to __ivm_count__, more than one hidden columns for each aggregate are > added to the target list. For example, columns for storing sum(x), > count(x) are added if we have avg(x). When the view is maintained, > aggregated values are updated using these hidden columns, also hidden > columns are updated at the same time. > > The maintenance of aggregated view is performed in > apply_old_delta_with_count() and apply_new_delta_with_count(). The SET > clauses for updating columns are generated by append_set_clause_*(). > > If the view has min(x) or max(x) and the minimum or maximal value is > deleted from a table, we need to update the value to the new min/max > recalculated from the tables rather than incremental computation. This > is performed in recalc_and_set_values(). > > --------------------------------------------------------------------------------------- > * Details of the patch-set (v28) > > > The patch-set consists of the following eleven patches. > > In the previous version, the number of patches were nine. > In the latest patch-set, the patches are divided more finely > aiming to make the review easier. > > > - 0001: Add a syntax to create Incrementally Maintainable Materialized Views > > The prposed syntax to create an incrementally maintainable materialized > view (IMMV) is; > > CREATE INCREMENTAL MATERIALIZED VIEW AS SELECT .....; > > However, this syntax is tentative, so any suggestions are welcomed. > > > - 0002: Add relisivm column to pg_class system catalog > > We add a new field in pg_class to indicate a relation is IMMV. > Another alternative is to add a new catalog for managing materialized > views including IMMV, but I am not sure if we want this. > > > - 0003: Allow to prolong life span of transition tables until transaction end > > This patch fixes the trigger system to allow to prolong lifespan of > tuple stores for transition tables until the transaction end. We need > this because multiple transition tables have to be preserved until the > end of the out-most query when multiple tables are modified by nested > triggers. (as explained above in Design - Multiple Tables Modification) > > If we don't want to change the trigger system in such way, the alternative > is to copy the contents of transition tables to other tuplestores, although > it needs more time and memory. > > > - 0004: Add Incremental View Maintenance support to pg_dump > > This patch enables pg_dump to output IMMV using the new syntax. > > > - 0005: Add Incremental View Maintenance support to psql > > This patch implements tab-completion for the new syntax and adds > information of IMMV to \d meta-command results. > > > - 0006: Add Incremental View Maintenance support > > This patch implements the basic IVM feature. > DISTINCT and aggregate are not supported here. > > When an IMMV is created, the view query is checked, and if any > non-supported feature is used, it raises an error. If it is ok, > triggers are created on base tables and an unique index is > created on the view if possible. > > In BEFORE trigger, an entry is created for each IMMV and the number > of trigger firing is counted. Also, the snapshot just before the > table modification is stored. > > In AFTER triggers, each transition tables are preserved. The number > of trigger firing is counted also here, and when the firing number of > BEFORE and AFTER trigger reach the same, it is deemed the final AFTER > trigger call. > > In the final AFTER trigger, the IMMV is maintained. Rewritten view > query is executed to generate delta tables, and deltas are applied > to the view. If multiple tables are modified simultaneously, this > process is iterated for each modified table. Tables before processed > are represented in "pre-update-state", processed tables are > "post-update-state" in the rewritten query. > > > - 0007: Add DISTINCT support for IVM > > This patch adds DISTINCT clause support. > > When an IMMV including DISTINCT is created, a hidden column > "__ivm_count__" is added to the target list. This column has the > number of duplicity of the same tuples. The duplicity is calculated > by adding "count(*)" and GROUP BY to the view query. > > When an IMMV is maintained, the duplicity in __ivm_count__ is updated, > and a tuples whose duplicity becomes zero can be deleted from the view. > This logic is implemented by SQL in apply_old_delta_with_count and > apply_new_delta_with_count. > > Columns starting with "__ivm_" are deemed hidden columns that doesn't > appear when a view is accessed by "SELECT * FROM ....". This is > implemented by fixing parse_relation.c. > > > - 0008: Add aggregates support in IVM > > This patch provides codes for aggregates support, specifically > for builtin count, sum, and avg. > > When an IMMV containing an aggregate is created, it is checked if this > aggregate function is supported, and if it is ok, some hidden columns > are added to the target list. > > When the IMMV is maintained, the aggregated value is updated as well as > related hidden columns. The way of update depends the type of aggregate > functions, and SET clause string is generated for each aggregate. > > > - 0009: Add support for min/max aggregates for IVM > > This patch adds min/max aggregates support. > > This is separated from #0008 because min/max needs more complicated > work than count, sum, and avg. > > If the view has min(x) or max(x) and the minimum or maximal value is > deleted from a table, we need to update the value to the new min/max > recalculated from the tables rather than incremental computation. > This is performed in recalc_and_set_values(). > > TIDs and keys of tuples that need re-calculation are returned as a > result of the query that deleted min/max values from the view using > RETURNING clause. The plan to recalculate and set the new min/max value > are stored and reused. > > > - 0010: regression tests > > This patch provides regression tests for IVM. > > > - 0011: documentation > > This patch provides documantation for IVM. > > --------------------------------------------------------------------------------------- > * Changes from the Previous Version (v27) > > - Allow TRUNCATE on base tables > > When a base table is truncated, the view content will be empty if the > view definition query does not contain an aggregate without a GROUP clause. > Therefore, such views can be truncated. > > Aggregate views without a GROUP clause always have one row. Therefore, > if a base table is truncated, the view will not be empty and will contain > a row with NULL value (or 0 for count()). So, in this case, we refresh the > view instead of truncating it. > > - Fix bugs reported by huyajun [1] > > [1] https://www.postgresql.org/message-id/tencent_FCAF11BCA5003FD16BDDFDDA5D6A19587809%40qq.com > > --------------------------------------------------------------------------------------- > * Discussion > > ** Aggregate support > > There were a few suggestions that general aggregate functions should be > supported [2][3], which may be possible by extending pg_aggregate catalog. > However, we decided to leave supporting general aggregates to the future work [4] > because it would need substantial works and make the patch more complex and > bigger. > > There has been no opposite opinion on this. However, if we need more discussion > on the design of aggregate support, we can omit aggregate support for the first > release of IVM. > > [2] https://www.postgresql.org/message-id/20191128140333.GA25947%40alvherre.pgsql > [3] https://www.postgresql.org/message-id/CAM-w4HOvDrL4ou6m%3D592zUiKGVzTcOpNj-d_cJqzL00fdsS5kg%40mail.gmail.com > [4] https://www.postgresql.org/message-id/20201016193034.9a4c44c79fc1eca7babe093e%40sraoss.co.jp > > ** Hidden columns > > In order to support DISTINCT or aggregates, our implementation uses hidden columns. > > Columns starting with "__ivm_" are hidden columns that doesn't appear when a > view is accessed by "SELECT * FROM ....". For this aim, parse_relation.c is > fixed. There was a proposal to enable hidden columns by adding a new flag to > pg_attribute [5], but this thread is no longer active, so we decided to check > the hidden column by its name [6]. > > [5] https://www.postgresql.org/message-id/flat/CAEepm%3D3ZHh%3Dp0nEEnVbs1Dig_UShPzHUcMNAqvDQUgYgcDo-pA%40mail.gmail.com > [6] https://www.postgresql.org/message-id/20201016193034.9a4c44c79fc1eca7babe093e%40sraoss.co.jp > > ** Concurrent Transactions > > When the view definition has more than one table, we acquire an exclusive > lock before the view maintenance in order to avoid inconsistent results. > This behavior was explained in [7]. The lock was improved to use weaker lock > when the view has only one table based on a suggestion from Konstantin Knizhnik [8]. > However, due to the implementation that uses ctid for identifying target tuples, > we still have to use an exclusive lock for DELETE and UPDATE. > > [7] https://www.postgresql.org/message-id/20200909092752.c91758a1bec3479668e82643%40sraoss.co.jp > [8] https://www.postgresql.org/message-id/5663f5f0-48af-686c-bf3c-62d279567e2a%40postgrespro.ru > > ** Automatic Index Creation > > When a view is created, a unique index is automatically created if > possible, that is, if the view definition query has a GROUP BY or > DISTINCT, or if the view contains all primary key attributes of > its base tables in the target list. It is necessary for efficient > view maintenance. This feature is based on a suggestion from > Konstantin Knizhnik [9]. > > [9] https://www.postgresql.org/message-id/89729da8-9042-7ea0-95af-e415df6da14d%40postgrespro.ru > > > ** Trigger and Transition Tables > > We implemented IVM based on triggers. This is because we want to use > transition tables to extract changes on base tables. Also, there are > other constraint that are using triggers in its implementation, like > foreign references. However, if we can use transition table like feature > without relying triggers, we don't have to insist to use triggers and we > might implement IVM in the executor directly as similar as declarative > partitioning. > > ** Feature to be Supported in the First Release > > The current patch-set supports DISTINCT and aggregates for built-in count, > sum, avg, min and max. Do we need all these feature for the first IVM release? > Supporting DISTINCT and aggregates needs discussion on hidden columns, and > for supporting min/max we need to discuss on re-calculation method. Before > handling such relatively advanced feature, maybe, should we focus to design > and implement of the basic feature of IVM? > > > Any suggestion and discussion are welcomed! > > Regards, > Yugo Nagata > > -- > Yugo NAGATA <nagata@sraoss.co.jp> > > > The followings are supported in view definition queries: > - SELECT ... FROM ... WHERE ..., joins (inner joins, self-joins) > Also, a view definition query cannot contain other views, materialized views, > foreign tables, partitioned tables, partitions, VALUES, non-immutable functions, > system columns, or expressions that contains aggregates. Does this also apply to tableoid? but tableoid is a constant, so it should be fine? can following two queries apply to this feature. select tableoid, unique1 from tenk1; select 1 as constant, unique1 from tenk1; I didn't apply the patch.(will do later, for someone to test, it would be a better idea to dump a whole file separately....).
On Wed, 28 Jun 2023 00:01:02 +0800 jian he <jian.universality@gmail.com> wrote: > On Thu, Jun 1, 2023 at 2:47 AM Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > > > On Thu, 1 Jun 2023 23:59:09 +0900 > > Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > > > > Hello hackers, > > > > > > Here's a rebased version of the patch-set adding Incremental View > > > Maintenance support for PostgreSQL. That was discussed in [1]. > > > > > [1] https://www.postgresql.org/message-id/flat/20181227215726.4d166b4874f8983a641123f5%40sraoss.co.jp > > > > --------------------------------------------------------------------------------------- > > * Overview > > > > Incremental View Maintenance (IVM) is a way to make materialized views > > up-to-date by computing only incremental changes and applying them on > > views. IVM is more efficient than REFRESH MATERIALIZED VIEW when > > only small parts of the view are changed. > > > > ** Feature > > > > The attached patchset provides a feature that allows materialized views > > to be updated automatically and incrementally just after a underlying > > table is modified. > > > > You can create an incementally maintainable materialized view (IMMV) > > by using CREATE INCREMENTAL MATERIALIZED VIEW command. > > > > The followings are supported in view definition queries: > > - SELECT ... FROM ... WHERE ..., joins (inner joins, self-joins) > > - some built-in aggregate functions (count, sum, avg, min, max) > > - GROUP BY clause > > - DISTINCT clause > > > > Views can contain multiple tuples with the same content (duplicate tuples). > > > > ** Restriction > > > > The following are not supported in a view definition: > > - Outer joins > > - Aggregates otehr than above, window functions, HAVING > > - Sub-queries, CTEs > > - Set operations (UNION, INTERSECT, EXCEPT) > > - DISTINCT ON, ORDER BY, LIMIT, OFFSET > > > > Also, a view definition query cannot contain other views, materialized views, > > foreign tables, partitioned tables, partitions, VALUES, non-immutable functions, > > system columns, or expressions that contains aggregates. > > > > --------------------------------------------------------------------------------------- > > * Design > > > > An IMMV is maintained using statement-level AFTER triggers. > > When an IMMV is created, triggers are automatically created on all base > > tables contained in the view definition query. > > > > When a table is modified, changes that occurred in the table are extracted > > as transition tables in the AFTER triggers. Then, changes that will occur in > > the view are calculated by a rewritten view dequery in which the modified table > > is replaced with the transition table. > > > > For example, if the view is defined as "SELECT * FROM R, S", and tuples inserted > > into R are stored in a transiton table dR, the tuples that will be inserted into > > the view are calculated as the result of "SELECT * FROM dR, S". > > > > ** Multiple Tables Modification > > > > Multiple tables can be modified in a statement when using triggers, foreign key > > constraint, or modifying CTEs. When multiple tables are modified, we need > > the state of tables before the modification. > > > > For example, when some tuples, dR and dS, are inserted into R and S respectively, > > the tuples that will be inserted into the view are calculated by the following > > two queries: > > > > "SELECT * FROM dR, S_pre" > > "SELECT * FROM R, dS" > > > > where S_pre is the table before the modification, R is the current state of > > table, that is, after the modification. This pre-update states of table > > is calculated by filtering inserted tuples and appending deleted tuples. > > The subquery that represents pre-update state is generated in get_prestate_rte(). > > Specifically, the insterted tuples are filtered by calling IVM_visible_in_prestate() > > in WHERE clause. This function checks the visibility of tuples by using > > the snapshot taken before table modification. The deleted tuples are contained > > in the old transition table, and this table is appended using UNION ALL. > > > > Transition tables for each modification are collected in each AFTER trigger > > function call. Then, the view maintenance is performed in the last call of > > the trigger. > > > > In the original PostgreSQL, tuplestores of transition tables are freed at the > > end of each nested query. However, their lifespan needs to be prolonged to > > the end of the out-most query in order to maintain the view in the last AFTER > > trigger. For this purpose, SetTransitionTablePreserved is added in trigger.c. > > > > ** Duplicate Tulpes > > > > When calculating changes that will occur in the view (= delta tables), > > multiplicity of tuples are calculated by using count(*). > > > > When deleting tuples from the view, tuples to be deleted are identified by > > joining the delta table with the view, and tuples are deleted as many as > > specified multiplicity by numbered using row_number() function. > > This is implemented in apply_old_delta(). > > > > When inserting tuples into the view, each tuple is duplicated to the > > specified multiplicity using generate_series() function. This is implemented > > in apply_new_delta(). > > > > ** DISTINCT clause > > > > When DISTINCT is used, the view has a hidden column __ivm_count__ that > > stores multiplicity for tuples. When tuples are deleted from or inserted into > > the view, the values of __ivm_count__ column is decreased or increased as many > > as specified multiplicity. Eventually, when the values becomes zero, the > > corresponding tuple is deleted from the view. This is implemented in > > apply_old_delta_with_count() and apply_new_delta_with_count(). > > > > ** Aggregates > > > > Built-in count sum, avg, min, and max are supported. Whether a given > > aggregate function can be used or not is checked by using its OID in > > check_aggregate_supports_ivm(). > > > > When creating a materialized view containing aggregates, in addition > > to __ivm_count__, more than one hidden columns for each aggregate are > > added to the target list. For example, columns for storing sum(x), > > count(x) are added if we have avg(x). When the view is maintained, > > aggregated values are updated using these hidden columns, also hidden > > columns are updated at the same time. > > > > The maintenance of aggregated view is performed in > > apply_old_delta_with_count() and apply_new_delta_with_count(). The SET > > clauses for updating columns are generated by append_set_clause_*(). > > > > If the view has min(x) or max(x) and the minimum or maximal value is > > deleted from a table, we need to update the value to the new min/max > > recalculated from the tables rather than incremental computation. This > > is performed in recalc_and_set_values(). > > > > --------------------------------------------------------------------------------------- > > * Details of the patch-set (v28) > > > > > The patch-set consists of the following eleven patches. > > > > In the previous version, the number of patches were nine. > > In the latest patch-set, the patches are divided more finely > > aiming to make the review easier. > > > > > - 0001: Add a syntax to create Incrementally Maintainable Materialized Views > > > > The prposed syntax to create an incrementally maintainable materialized > > view (IMMV) is; > > > > CREATE INCREMENTAL MATERIALIZED VIEW AS SELECT .....; > > > > However, this syntax is tentative, so any suggestions are welcomed. > > > > > - 0002: Add relisivm column to pg_class system catalog > > > > We add a new field in pg_class to indicate a relation is IMMV. > > Another alternative is to add a new catalog for managing materialized > > views including IMMV, but I am not sure if we want this. > > > > > - 0003: Allow to prolong life span of transition tables until transaction end > > > > This patch fixes the trigger system to allow to prolong lifespan of > > tuple stores for transition tables until the transaction end. We need > > this because multiple transition tables have to be preserved until the > > end of the out-most query when multiple tables are modified by nested > > triggers. (as explained above in Design - Multiple Tables Modification) > > > > If we don't want to change the trigger system in such way, the alternative > > is to copy the contents of transition tables to other tuplestores, although > > it needs more time and memory. > > > > > - 0004: Add Incremental View Maintenance support to pg_dump > > > > This patch enables pg_dump to output IMMV using the new syntax. > > > > > - 0005: Add Incremental View Maintenance support to psql > > > > This patch implements tab-completion for the new syntax and adds > > information of IMMV to \d meta-command results. > > > > > - 0006: Add Incremental View Maintenance support > > > > This patch implements the basic IVM feature. > > DISTINCT and aggregate are not supported here. > > > > When an IMMV is created, the view query is checked, and if any > > non-supported feature is used, it raises an error. If it is ok, > > triggers are created on base tables and an unique index is > > created on the view if possible. > > > > In BEFORE trigger, an entry is created for each IMMV and the number > > of trigger firing is counted. Also, the snapshot just before the > > table modification is stored. > > > > In AFTER triggers, each transition tables are preserved. The number > > of trigger firing is counted also here, and when the firing number of > > BEFORE and AFTER trigger reach the same, it is deemed the final AFTER > > trigger call. > > > > In the final AFTER trigger, the IMMV is maintained. Rewritten view > > query is executed to generate delta tables, and deltas are applied > > to the view. If multiple tables are modified simultaneously, this > > process is iterated for each modified table. Tables before processed > > are represented in "pre-update-state", processed tables are > > "post-update-state" in the rewritten query. > > > > > - 0007: Add DISTINCT support for IVM > > > > This patch adds DISTINCT clause support. > > > > When an IMMV including DISTINCT is created, a hidden column > > "__ivm_count__" is added to the target list. This column has the > > number of duplicity of the same tuples. The duplicity is calculated > > by adding "count(*)" and GROUP BY to the view query. > > > > When an IMMV is maintained, the duplicity in __ivm_count__ is updated, > > and a tuples whose duplicity becomes zero can be deleted from the view. > > This logic is implemented by SQL in apply_old_delta_with_count and > > apply_new_delta_with_count. > > > > Columns starting with "__ivm_" are deemed hidden columns that doesn't > > appear when a view is accessed by "SELECT * FROM ....". This is > > implemented by fixing parse_relation.c. > > > > > - 0008: Add aggregates support in IVM > > > > This patch provides codes for aggregates support, specifically > > for builtin count, sum, and avg. > > > > When an IMMV containing an aggregate is created, it is checked if this > > aggregate function is supported, and if it is ok, some hidden columns > > are added to the target list. > > > > When the IMMV is maintained, the aggregated value is updated as well as > > related hidden columns. The way of update depends the type of aggregate > > functions, and SET clause string is generated for each aggregate. > > > > > - 0009: Add support for min/max aggregates for IVM > > > > This patch adds min/max aggregates support. > > > > This is separated from #0008 because min/max needs more complicated > > work than count, sum, and avg. > > > > If the view has min(x) or max(x) and the minimum or maximal value is > > deleted from a table, we need to update the value to the new min/max > > recalculated from the tables rather than incremental computation. > > This is performed in recalc_and_set_values(). > > > > TIDs and keys of tuples that need re-calculation are returned as a > > result of the query that deleted min/max values from the view using > > RETURNING clause. The plan to recalculate and set the new min/max value > > are stored and reused. > > > > > - 0010: regression tests > > > > This patch provides regression tests for IVM. > > > > > - 0011: documentation > > > > This patch provides documantation for IVM. > > > > --------------------------------------------------------------------------------------- > > * Changes from the Previous Version (v27) > > > > - Allow TRUNCATE on base tables > > > > When a base table is truncated, the view content will be empty if the > > view definition query does not contain an aggregate without a GROUP clause. > > Therefore, such views can be truncated. > > > > Aggregate views without a GROUP clause always have one row. Therefore, > > if a base table is truncated, the view will not be empty and will contain > > a row with NULL value (or 0 for count()). So, in this case, we refresh the > > view instead of truncating it. > > > > - Fix bugs reported by huyajun [1] > > > > [1] https://www.postgresql.org/message-id/tencent_FCAF11BCA5003FD16BDDFDDA5D6A19587809%40qq.com > > > > --------------------------------------------------------------------------------------- > > * Discussion > > > > ** Aggregate support > > > > There were a few suggestions that general aggregate functions should be > > supported [2][3], which may be possible by extending pg_aggregate catalog. > > However, we decided to leave supporting general aggregates to the future work [4] > > because it would need substantial works and make the patch more complex and > > bigger. > > > > There has been no opposite opinion on this. However, if we need more discussion > > on the design of aggregate support, we can omit aggregate support for the first > > release of IVM. > > > > [2] https://www.postgresql.org/message-id/20191128140333.GA25947%40alvherre.pgsql > > [3] https://www.postgresql.org/message-id/CAM-w4HOvDrL4ou6m%3D592zUiKGVzTcOpNj-d_cJqzL00fdsS5kg%40mail.gmail.com > > [4] https://www.postgresql.org/message-id/20201016193034.9a4c44c79fc1eca7babe093e%40sraoss.co.jp > > > > ** Hidden columns > > > > In order to support DISTINCT or aggregates, our implementation uses hidden columns. > > > > Columns starting with "__ivm_" are hidden columns that doesn't appear when a > > view is accessed by "SELECT * FROM ....". For this aim, parse_relation.c is > > fixed. There was a proposal to enable hidden columns by adding a new flag to > > pg_attribute [5], but this thread is no longer active, so we decided to check > > the hidden column by its name [6]. > > > > [5] https://www.postgresql.org/message-id/flat/CAEepm%3D3ZHh%3Dp0nEEnVbs1Dig_UShPzHUcMNAqvDQUgYgcDo-pA%40mail.gmail.com > > [6] https://www.postgresql.org/message-id/20201016193034.9a4c44c79fc1eca7babe093e%40sraoss.co.jp > > > > ** Concurrent Transactions > > > > When the view definition has more than one table, we acquire an exclusive > > lock before the view maintenance in order to avoid inconsistent results. > > This behavior was explained in [7]. The lock was improved to use weaker lock > > when the view has only one table based on a suggestion from Konstantin Knizhnik [8]. > > However, due to the implementation that uses ctid for identifying target tuples, > > we still have to use an exclusive lock for DELETE and UPDATE. > > > > [7] https://www.postgresql.org/message-id/20200909092752.c91758a1bec3479668e82643%40sraoss.co.jp > > [8] https://www.postgresql.org/message-id/5663f5f0-48af-686c-bf3c-62d279567e2a%40postgrespro.ru > > > > ** Automatic Index Creation > > > > When a view is created, a unique index is automatically created if > > possible, that is, if the view definition query has a GROUP BY or > > DISTINCT, or if the view contains all primary key attributes of > > its base tables in the target list. It is necessary for efficient > > view maintenance. This feature is based on a suggestion from > > Konstantin Knizhnik [9]. > > > > [9] https://www.postgresql.org/message-id/89729da8-9042-7ea0-95af-e415df6da14d%40postgrespro.ru > > > > > > ** Trigger and Transition Tables > > > > We implemented IVM based on triggers. This is because we want to use > > transition tables to extract changes on base tables. Also, there are > > other constraint that are using triggers in its implementation, like > > foreign references. However, if we can use transition table like feature > > without relying triggers, we don't have to insist to use triggers and we > > might implement IVM in the executor directly as similar as declarative > > partitioning. > > > > ** Feature to be Supported in the First Release > > > > The current patch-set supports DISTINCT and aggregates for built-in count, > > sum, avg, min and max. Do we need all these feature for the first IVM release? > > Supporting DISTINCT and aggregates needs discussion on hidden columns, and > > for supporting min/max we need to discuss on re-calculation method. Before > > handling such relatively advanced feature, maybe, should we focus to design > > and implement of the basic feature of IVM? > > > > > > Any suggestion and discussion are welcomed! > > > > Regards, > > Yugo Nagata > > > > -- > > Yugo NAGATA <nagata@sraoss.co.jp> > > > > > > > > The followings are supported in view definition queries: > > - SELECT ... FROM ... WHERE ..., joins (inner joins, self-joins) > > > > Also, a view definition query cannot contain other views, materialized views, > > foreign tables, partitioned tables, partitions, VALUES, non-immutable functions, > > system columns, or expressions that contains aggregates. > > Does this also apply to tableoid? but tableoid is a constant, so it > should be fine? > can following two queries apply to this feature. > select tableoid, unique1 from tenk1; Currently, this is not allowed because tableoid is a system column. As you say, tableoid is a constant, so we can allow. Should we do this? > select 1 as constant, unique1 from tenk1; This is allowed, of course. > I didn't apply the patch.(will do later, for someone to test, it would > be a better idea to dump a whole file separately....). Thank you! I'm looking forward to your feedback. (I didn't attach a whole patch separately because I wouldn't like cfbot to be unhappy...) Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
On Wed, Jun 28, 2023 at 4:06 PM Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > On Wed, 28 Jun 2023 00:01:02 +0800 > jian he <jian.universality@gmail.com> wrote: > > > On Thu, Jun 1, 2023 at 2:47 AM Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > > > > > On Thu, 1 Jun 2023 23:59:09 +0900 > > > Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > > > > > > Hello hackers, > > > > > > > > Here's a rebased version of the patch-set adding Incremental View > > > > Maintenance support for PostgreSQL. That was discussed in [1]. > > > > > > > [1] https://www.postgresql.org/message-id/flat/20181227215726.4d166b4874f8983a641123f5%40sraoss.co.jp > > > > > > --------------------------------------------------------------------------------------- > > > * Overview > > > > > > Incremental View Maintenance (IVM) is a way to make materialized views > > > up-to-date by computing only incremental changes and applying them on > > > views. IVM is more efficient than REFRESH MATERIALIZED VIEW when > > > only small parts of the view are changed. > > > > > > ** Feature > > > > > > The attached patchset provides a feature that allows materialized views > > > to be updated automatically and incrementally just after a underlying > > > table is modified. > > > > > > You can create an incementally maintainable materialized view (IMMV) > > > by using CREATE INCREMENTAL MATERIALIZED VIEW command. > > > > > > The followings are supported in view definition queries: > > > - SELECT ... FROM ... WHERE ..., joins (inner joins, self-joins) > > > - some built-in aggregate functions (count, sum, avg, min, max) > > > - GROUP BY clause > > > - DISTINCT clause > > > > > > Views can contain multiple tuples with the same content (duplicate tuples). > > > > > > ** Restriction > > > > > > The following are not supported in a view definition: > > > - Outer joins > > > - Aggregates otehr than above, window functions, HAVING > > > - Sub-queries, CTEs > > > - Set operations (UNION, INTERSECT, EXCEPT) > > > - DISTINCT ON, ORDER BY, LIMIT, OFFSET > > > > > > Also, a view definition query cannot contain other views, materialized views, > > > foreign tables, partitioned tables, partitions, VALUES, non-immutable functions, > > > system columns, or expressions that contains aggregates. > > > > > > --------------------------------------------------------------------------------------- > > > * Design > > > > > > An IMMV is maintained using statement-level AFTER triggers. > > > When an IMMV is created, triggers are automatically created on all base > > > tables contained in the view definition query. > > > > > > When a table is modified, changes that occurred in the table are extracted > > > as transition tables in the AFTER triggers. Then, changes that will occur in > > > the view are calculated by a rewritten view dequery in which the modified table > > > is replaced with the transition table. > > > > > > For example, if the view is defined as "SELECT * FROM R, S", and tuples inserted > > > into R are stored in a transiton table dR, the tuples that will be inserted into > > > the view are calculated as the result of "SELECT * FROM dR, S". > > > > > > ** Multiple Tables Modification > > > > > > Multiple tables can be modified in a statement when using triggers, foreign key > > > constraint, or modifying CTEs. When multiple tables are modified, we need > > > the state of tables before the modification. > > > > > > For example, when some tuples, dR and dS, are inserted into R and S respectively, > > > the tuples that will be inserted into the view are calculated by the following > > > two queries: > > > > > > "SELECT * FROM dR, S_pre" > > > "SELECT * FROM R, dS" > > > > > > where S_pre is the table before the modification, R is the current state of > > > table, that is, after the modification. This pre-update states of table > > > is calculated by filtering inserted tuples and appending deleted tuples. > > > The subquery that represents pre-update state is generated in get_prestate_rte(). > > > Specifically, the insterted tuples are filtered by calling IVM_visible_in_prestate() > > > in WHERE clause. This function checks the visibility of tuples by using > > > the snapshot taken before table modification. The deleted tuples are contained > > > in the old transition table, and this table is appended using UNION ALL. > > > > > > Transition tables for each modification are collected in each AFTER trigger > > > function call. Then, the view maintenance is performed in the last call of > > > the trigger. > > > > > > In the original PostgreSQL, tuplestores of transition tables are freed at the > > > end of each nested query. However, their lifespan needs to be prolonged to > > > the end of the out-most query in order to maintain the view in the last AFTER > > > trigger. For this purpose, SetTransitionTablePreserved is added in trigger.c. > > > > > > ** Duplicate Tulpes > > > > > > When calculating changes that will occur in the view (= delta tables), > > > multiplicity of tuples are calculated by using count(*). > > > > > > When deleting tuples from the view, tuples to be deleted are identified by > > > joining the delta table with the view, and tuples are deleted as many as > > > specified multiplicity by numbered using row_number() function. > > > This is implemented in apply_old_delta(). > > > > > > When inserting tuples into the view, each tuple is duplicated to the > > > specified multiplicity using generate_series() function. This is implemented > > > in apply_new_delta(). > > > > > > ** DISTINCT clause > > > > > > When DISTINCT is used, the view has a hidden column __ivm_count__ that > > > stores multiplicity for tuples. When tuples are deleted from or inserted into > > > the view, the values of __ivm_count__ column is decreased or increased as many > > > as specified multiplicity. Eventually, when the values becomes zero, the > > > corresponding tuple is deleted from the view. This is implemented in > > > apply_old_delta_with_count() and apply_new_delta_with_count(). > > > > > > ** Aggregates > > > > > > Built-in count sum, avg, min, and max are supported. Whether a given > > > aggregate function can be used or not is checked by using its OID in > > > check_aggregate_supports_ivm(). > > > > > > When creating a materialized view containing aggregates, in addition > > > to __ivm_count__, more than one hidden columns for each aggregate are > > > added to the target list. For example, columns for storing sum(x), > > > count(x) are added if we have avg(x). When the view is maintained, > > > aggregated values are updated using these hidden columns, also hidden > > > columns are updated at the same time. > > > > > > The maintenance of aggregated view is performed in > > > apply_old_delta_with_count() and apply_new_delta_with_count(). The SET > > > clauses for updating columns are generated by append_set_clause_*(). > > > > > > If the view has min(x) or max(x) and the minimum or maximal value is > > > deleted from a table, we need to update the value to the new min/max > > > recalculated from the tables rather than incremental computation. This > > > is performed in recalc_and_set_values(). > > > > > > --------------------------------------------------------------------------------------- > > > * Details of the patch-set (v28) > > > > > > > The patch-set consists of the following eleven patches. > > > > > > In the previous version, the number of patches were nine. > > > In the latest patch-set, the patches are divided more finely > > > aiming to make the review easier. > > > > > > > - 0001: Add a syntax to create Incrementally Maintainable Materialized Views > > > > > > The prposed syntax to create an incrementally maintainable materialized > > > view (IMMV) is; > > > > > > CREATE INCREMENTAL MATERIALIZED VIEW AS SELECT .....; > > > > > > However, this syntax is tentative, so any suggestions are welcomed. > > > > > > > - 0002: Add relisivm column to pg_class system catalog > > > > > > We add a new field in pg_class to indicate a relation is IMMV. > > > Another alternative is to add a new catalog for managing materialized > > > views including IMMV, but I am not sure if we want this. > > > > > > > - 0003: Allow to prolong life span of transition tables until transaction end > > > > > > This patch fixes the trigger system to allow to prolong lifespan of > > > tuple stores for transition tables until the transaction end. We need > > > this because multiple transition tables have to be preserved until the > > > end of the out-most query when multiple tables are modified by nested > > > triggers. (as explained above in Design - Multiple Tables Modification) > > > > > > If we don't want to change the trigger system in such way, the alternative > > > is to copy the contents of transition tables to other tuplestores, although > > > it needs more time and memory. > > > > > > > - 0004: Add Incremental View Maintenance support to pg_dump > > > > > > This patch enables pg_dump to output IMMV using the new syntax. > > > > > > > - 0005: Add Incremental View Maintenance support to psql > > > > > > This patch implements tab-completion for the new syntax and adds > > > information of IMMV to \d meta-command results. > > > > > > > - 0006: Add Incremental View Maintenance support > > > > > > This patch implements the basic IVM feature. > > > DISTINCT and aggregate are not supported here. > > > > > > When an IMMV is created, the view query is checked, and if any > > > non-supported feature is used, it raises an error. If it is ok, > > > triggers are created on base tables and an unique index is > > > created on the view if possible. > > > > > > In BEFORE trigger, an entry is created for each IMMV and the number > > > of trigger firing is counted. Also, the snapshot just before the > > > table modification is stored. > > > > > > In AFTER triggers, each transition tables are preserved. The number > > > of trigger firing is counted also here, and when the firing number of > > > BEFORE and AFTER trigger reach the same, it is deemed the final AFTER > > > trigger call. > > > > > > In the final AFTER trigger, the IMMV is maintained. Rewritten view > > > query is executed to generate delta tables, and deltas are applied > > > to the view. If multiple tables are modified simultaneously, this > > > process is iterated for each modified table. Tables before processed > > > are represented in "pre-update-state", processed tables are > > > "post-update-state" in the rewritten query. > > > > > > > - 0007: Add DISTINCT support for IVM > > > > > > This patch adds DISTINCT clause support. > > > > > > When an IMMV including DISTINCT is created, a hidden column > > > "__ivm_count__" is added to the target list. This column has the > > > number of duplicity of the same tuples. The duplicity is calculated > > > by adding "count(*)" and GROUP BY to the view query. > > > > > > When an IMMV is maintained, the duplicity in __ivm_count__ is updated, > > > and a tuples whose duplicity becomes zero can be deleted from the view. > > > This logic is implemented by SQL in apply_old_delta_with_count and > > > apply_new_delta_with_count. > > > > > > Columns starting with "__ivm_" are deemed hidden columns that doesn't > > > appear when a view is accessed by "SELECT * FROM ....". This is > > > implemented by fixing parse_relation.c. > > > > > > > - 0008: Add aggregates support in IVM > > > > > > This patch provides codes for aggregates support, specifically > > > for builtin count, sum, and avg. > > > > > > When an IMMV containing an aggregate is created, it is checked if this > > > aggregate function is supported, and if it is ok, some hidden columns > > > are added to the target list. > > > > > > When the IMMV is maintained, the aggregated value is updated as well as > > > related hidden columns. The way of update depends the type of aggregate > > > functions, and SET clause string is generated for each aggregate. > > > > > > > - 0009: Add support for min/max aggregates for IVM > > > > > > This patch adds min/max aggregates support. > > > > > > This is separated from #0008 because min/max needs more complicated > > > work than count, sum, and avg. > > > > > > If the view has min(x) or max(x) and the minimum or maximal value is > > > deleted from a table, we need to update the value to the new min/max > > > recalculated from the tables rather than incremental computation. > > > This is performed in recalc_and_set_values(). > > > > > > TIDs and keys of tuples that need re-calculation are returned as a > > > result of the query that deleted min/max values from the view using > > > RETURNING clause. The plan to recalculate and set the new min/max value > > > are stored and reused. > > > > > > > - 0010: regression tests > > > > > > This patch provides regression tests for IVM. > > > > > > > - 0011: documentation > > > > > > This patch provides documantation for IVM. > > > > > > --------------------------------------------------------------------------------------- > > > * Changes from the Previous Version (v27) > > > > > > - Allow TRUNCATE on base tables > > > > > > When a base table is truncated, the view content will be empty if the > > > view definition query does not contain an aggregate without a GROUP clause. > > > Therefore, such views can be truncated. > > > > > > Aggregate views without a GROUP clause always have one row. Therefore, > > > if a base table is truncated, the view will not be empty and will contain > > > a row with NULL value (or 0 for count()). So, in this case, we refresh the > > > view instead of truncating it. > > > > > > - Fix bugs reported by huyajun [1] > > > > > > [1] https://www.postgresql.org/message-id/tencent_FCAF11BCA5003FD16BDDFDDA5D6A19587809%40qq.com > > > > > > --------------------------------------------------------------------------------------- > > > * Discussion > > > > > > ** Aggregate support > > > > > > There were a few suggestions that general aggregate functions should be > > > supported [2][3], which may be possible by extending pg_aggregate catalog. > > > However, we decided to leave supporting general aggregates to the future work [4] > > > because it would need substantial works and make the patch more complex and > > > bigger. > > > > > > There has been no opposite opinion on this. However, if we need more discussion > > > on the design of aggregate support, we can omit aggregate support for the first > > > release of IVM. > > > > > > [2] https://www.postgresql.org/message-id/20191128140333.GA25947%40alvherre.pgsql > > > [3] https://www.postgresql.org/message-id/CAM-w4HOvDrL4ou6m%3D592zUiKGVzTcOpNj-d_cJqzL00fdsS5kg%40mail.gmail.com > > > [4] https://www.postgresql.org/message-id/20201016193034.9a4c44c79fc1eca7babe093e%40sraoss.co.jp > > > > > > ** Hidden columns > > > > > > In order to support DISTINCT or aggregates, our implementation uses hidden columns. > > > > > > Columns starting with "__ivm_" are hidden columns that doesn't appear when a > > > view is accessed by "SELECT * FROM ....". For this aim, parse_relation.c is > > > fixed. There was a proposal to enable hidden columns by adding a new flag to > > > pg_attribute [5], but this thread is no longer active, so we decided to check > > > the hidden column by its name [6]. > > > > > > [5] https://www.postgresql.org/message-id/flat/CAEepm%3D3ZHh%3Dp0nEEnVbs1Dig_UShPzHUcMNAqvDQUgYgcDo-pA%40mail.gmail.com > > > [6] https://www.postgresql.org/message-id/20201016193034.9a4c44c79fc1eca7babe093e%40sraoss.co.jp > > > > > > ** Concurrent Transactions > > > > > > When the view definition has more than one table, we acquire an exclusive > > > lock before the view maintenance in order to avoid inconsistent results. > > > This behavior was explained in [7]. The lock was improved to use weaker lock > > > when the view has only one table based on a suggestion from Konstantin Knizhnik [8]. > > > However, due to the implementation that uses ctid for identifying target tuples, > > > we still have to use an exclusive lock for DELETE and UPDATE. > > > > > > [7] https://www.postgresql.org/message-id/20200909092752.c91758a1bec3479668e82643%40sraoss.co.jp > > > [8] https://www.postgresql.org/message-id/5663f5f0-48af-686c-bf3c-62d279567e2a%40postgrespro.ru > > > > > > ** Automatic Index Creation > > > > > > When a view is created, a unique index is automatically created if > > > possible, that is, if the view definition query has a GROUP BY or > > > DISTINCT, or if the view contains all primary key attributes of > > > its base tables in the target list. It is necessary for efficient > > > view maintenance. This feature is based on a suggestion from > > > Konstantin Knizhnik [9]. > > > > > > [9] https://www.postgresql.org/message-id/89729da8-9042-7ea0-95af-e415df6da14d%40postgrespro.ru > > > > > > > > > ** Trigger and Transition Tables > > > > > > We implemented IVM based on triggers. This is because we want to use > > > transition tables to extract changes on base tables. Also, there are > > > other constraint that are using triggers in its implementation, like > > > foreign references. However, if we can use transition table like feature > > > without relying triggers, we don't have to insist to use triggers and we > > > might implement IVM in the executor directly as similar as declarative > > > partitioning. > > > > > > ** Feature to be Supported in the First Release > > > > > > The current patch-set supports DISTINCT and aggregates for built-in count, > > > sum, avg, min and max. Do we need all these feature for the first IVM release? > > > Supporting DISTINCT and aggregates needs discussion on hidden columns, and > > > for supporting min/max we need to discuss on re-calculation method. Before > > > handling such relatively advanced feature, maybe, should we focus to design > > > and implement of the basic feature of IVM? > > > > > > > > > Any suggestion and discussion are welcomed! > > > > > > Regards, > > > Yugo Nagata > > > > > > -- > > > Yugo NAGATA <nagata@sraoss.co.jp> > > > > > > > > > > > > > The followings are supported in view definition queries: > > > - SELECT ... FROM ... WHERE ..., joins (inner joins, self-joins) > > > > > > > Also, a view definition query cannot contain other views, materialized views, > > > foreign tables, partitioned tables, partitions, VALUES, non-immutable functions, > > > system columns, or expressions that contains aggregates. > > > > Does this also apply to tableoid? but tableoid is a constant, so it > > should be fine? > > can following two queries apply to this feature. > > select tableoid, unique1 from tenk1; > > Currently, this is not allowed because tableoid is a system column. > As you say, tableoid is a constant, so we can allow. Should we do this? > > > select 1 as constant, unique1 from tenk1; > > This is allowed, of course. > > > I didn't apply the patch.(will do later, for someone to test, it would > > be a better idea to dump a whole file separately....). > > Thank you! I'm looking forward to your feedback. > (I didn't attach a whole patch separately because I wouldn't like > cfbot to be unhappy...) > > Regards, > Yugo Nagata > > -- > Yugo NAGATA <nagata@sraoss.co.jp> I played around first half of regress patch. these all following queries fails. CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_rename AS SELECT DISTINCT * , 1 as "__ivm_count__" FROM mv_base_a; CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_rename AS SELECT DISTINCT * , 1 as "__ivm_countblablabla" FROM mv_base_a; CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_rename AS SELECT DISTINCT * , 1 as "__ivm_count" FROM mv_base_a; CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_rename AS SELECT DISTINCT * , 1 as "__ivm_count_____" FROM mv_base_a; CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_rename AS SELECT DISTINCT * , 1 as "__ivm_countblabla" FROM mv_base_a; so the hidden column reserved pattern "__ivm_count.*"? that would be a lot.... select * from pg_matviews where matviewname = 'mv_ivm_1'; don't have relisivm option. it's reasonable to make it in view pg_matviews?
On Thu, Jun 29, 2023 at 12:40 AM jian he <jian.universality@gmail.com> wrote: > > On Wed, Jun 28, 2023 at 4:06 PM Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > > > On Wed, 28 Jun 2023 00:01:02 +0800 > > jian he <jian.universality@gmail.com> wrote: > > > > > On Thu, Jun 1, 2023 at 2:47 AM Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > > > > > > > On Thu, 1 Jun 2023 23:59:09 +0900 > > > > Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > > > > > > > > Hello hackers, > > > > > > > > > > Here's a rebased version of the patch-set adding Incremental View > > > > > Maintenance support for PostgreSQL. That was discussed in [1]. > > > > > > > > > [1] https://www.postgresql.org/message-id/flat/20181227215726.4d166b4874f8983a641123f5%40sraoss.co.jp > > > > > > > > --------------------------------------------------------------------------------------- > > > > * Overview > > > > > > > > Incremental View Maintenance (IVM) is a way to make materialized views > > > > up-to-date by computing only incremental changes and applying them on > > > > views. IVM is more efficient than REFRESH MATERIALIZED VIEW when > > > > only small parts of the view are changed. > > > > > > > > ** Feature > > > > > > > > The attached patchset provides a feature that allows materialized views > > > > to be updated automatically and incrementally just after a underlying > > > > table is modified. > > > > > > > > You can create an incementally maintainable materialized view (IMMV) > > > > by using CREATE INCREMENTAL MATERIALIZED VIEW command. > > > > > > > > The followings are supported in view definition queries: > > > > - SELECT ... FROM ... WHERE ..., joins (inner joins, self-joins) > > > > - some built-in aggregate functions (count, sum, avg, min, max) > > > > - GROUP BY clause > > > > - DISTINCT clause > > > > > > > > Views can contain multiple tuples with the same content (duplicate tuples). > > > > > > > > ** Restriction > > > > > > > > The following are not supported in a view definition: > > > > - Outer joins > > > > - Aggregates otehr than above, window functions, HAVING > > > > - Sub-queries, CTEs > > > > - Set operations (UNION, INTERSECT, EXCEPT) > > > > - DISTINCT ON, ORDER BY, LIMIT, OFFSET > > > > > > > > Also, a view definition query cannot contain other views, materialized views, > > > > foreign tables, partitioned tables, partitions, VALUES, non-immutable functions, > > > > system columns, or expressions that contains aggregates. > > > > > > > > --------------------------------------------------------------------------------------- > > > > * Design > > > > > > > > An IMMV is maintained using statement-level AFTER triggers. > > > > When an IMMV is created, triggers are automatically created on all base > > > > tables contained in the view definition query. > > > > > > > > When a table is modified, changes that occurred in the table are extracted > > > > as transition tables in the AFTER triggers. Then, changes that will occur in > > > > the view are calculated by a rewritten view dequery in which the modified table > > > > is replaced with the transition table. > > > > > > > > For example, if the view is defined as "SELECT * FROM R, S", and tuples inserted > > > > into R are stored in a transiton table dR, the tuples that will be inserted into > > > > the view are calculated as the result of "SELECT * FROM dR, S". > > > > > > > > ** Multiple Tables Modification > > > > > > > > Multiple tables can be modified in a statement when using triggers, foreign key > > > > constraint, or modifying CTEs. When multiple tables are modified, we need > > > > the state of tables before the modification. > > > > > > > > For example, when some tuples, dR and dS, are inserted into R and S respectively, > > > > the tuples that will be inserted into the view are calculated by the following > > > > two queries: > > > > > > > > "SELECT * FROM dR, S_pre" > > > > "SELECT * FROM R, dS" > > > > > > > > where S_pre is the table before the modification, R is the current state of > > > > table, that is, after the modification. This pre-update states of table > > > > is calculated by filtering inserted tuples and appending deleted tuples. > > > > The subquery that represents pre-update state is generated in get_prestate_rte(). > > > > Specifically, the insterted tuples are filtered by calling IVM_visible_in_prestate() > > > > in WHERE clause. This function checks the visibility of tuples by using > > > > the snapshot taken before table modification. The deleted tuples are contained > > > > in the old transition table, and this table is appended using UNION ALL. > > > > > > > > Transition tables for each modification are collected in each AFTER trigger > > > > function call. Then, the view maintenance is performed in the last call of > > > > the trigger. > > > > > > > > In the original PostgreSQL, tuplestores of transition tables are freed at the > > > > end of each nested query. However, their lifespan needs to be prolonged to > > > > the end of the out-most query in order to maintain the view in the last AFTER > > > > trigger. For this purpose, SetTransitionTablePreserved is added in trigger.c. > > > > > > > > ** Duplicate Tulpes > > > > > > > > When calculating changes that will occur in the view (= delta tables), > > > > multiplicity of tuples are calculated by using count(*). > > > > > > > > When deleting tuples from the view, tuples to be deleted are identified by > > > > joining the delta table with the view, and tuples are deleted as many as > > > > specified multiplicity by numbered using row_number() function. > > > > This is implemented in apply_old_delta(). > > > > > > > > When inserting tuples into the view, each tuple is duplicated to the > > > > specified multiplicity using generate_series() function. This is implemented > > > > in apply_new_delta(). > > > > > > > > ** DISTINCT clause > > > > > > > > When DISTINCT is used, the view has a hidden column __ivm_count__ that > > > > stores multiplicity for tuples. When tuples are deleted from or inserted into > > > > the view, the values of __ivm_count__ column is decreased or increased as many > > > > as specified multiplicity. Eventually, when the values becomes zero, the > > > > corresponding tuple is deleted from the view. This is implemented in > > > > apply_old_delta_with_count() and apply_new_delta_with_count(). > > > > > > > > ** Aggregates > > > > > > > > Built-in count sum, avg, min, and max are supported. Whether a given > > > > aggregate function can be used or not is checked by using its OID in > > > > check_aggregate_supports_ivm(). > > > > > > > > When creating a materialized view containing aggregates, in addition > > > > to __ivm_count__, more than one hidden columns for each aggregate are > > > > added to the target list. For example, columns for storing sum(x), > > > > count(x) are added if we have avg(x). When the view is maintained, > > > > aggregated values are updated using these hidden columns, also hidden > > > > columns are updated at the same time. > > > > > > > > The maintenance of aggregated view is performed in > > > > apply_old_delta_with_count() and apply_new_delta_with_count(). The SET > > > > clauses for updating columns are generated by append_set_clause_*(). > > > > > > > > If the view has min(x) or max(x) and the minimum or maximal value is > > > > deleted from a table, we need to update the value to the new min/max > > > > recalculated from the tables rather than incremental computation. This > > > > is performed in recalc_and_set_values(). > > > > > > > > --------------------------------------------------------------------------------------- > > > > * Details of the patch-set (v28) > > > > > > > > > The patch-set consists of the following eleven patches. > > > > > > > > In the previous version, the number of patches were nine. > > > > In the latest patch-set, the patches are divided more finely > > > > aiming to make the review easier. > > > > > > > > > - 0001: Add a syntax to create Incrementally Maintainable Materialized Views > > > > > > > > The prposed syntax to create an incrementally maintainable materialized > > > > view (IMMV) is; > > > > > > > > CREATE INCREMENTAL MATERIALIZED VIEW AS SELECT .....; > > > > > > > > However, this syntax is tentative, so any suggestions are welcomed. > > > > > > > > > - 0002: Add relisivm column to pg_class system catalog > > > > > > > > We add a new field in pg_class to indicate a relation is IMMV. > > > > Another alternative is to add a new catalog for managing materialized > > > > views including IMMV, but I am not sure if we want this. > > > > > > > > > - 0003: Allow to prolong life span of transition tables until transaction end > > > > > > > > This patch fixes the trigger system to allow to prolong lifespan of > > > > tuple stores for transition tables until the transaction end. We need > > > > this because multiple transition tables have to be preserved until the > > > > end of the out-most query when multiple tables are modified by nested > > > > triggers. (as explained above in Design - Multiple Tables Modification) > > > > > > > > If we don't want to change the trigger system in such way, the alternative > > > > is to copy the contents of transition tables to other tuplestores, although > > > > it needs more time and memory. > > > > > > > > > - 0004: Add Incremental View Maintenance support to pg_dump > > > > > > > > This patch enables pg_dump to output IMMV using the new syntax. > > > > > > > > > - 0005: Add Incremental View Maintenance support to psql > > > > > > > > This patch implements tab-completion for the new syntax and adds > > > > information of IMMV to \d meta-command results. > > > > > > > > > - 0006: Add Incremental View Maintenance support > > > > > > > > This patch implements the basic IVM feature. > > > > DISTINCT and aggregate are not supported here. > > > > > > > > When an IMMV is created, the view query is checked, and if any > > > > non-supported feature is used, it raises an error. If it is ok, > > > > triggers are created on base tables and an unique index is > > > > created on the view if possible. > > > > > > > > In BEFORE trigger, an entry is created for each IMMV and the number > > > > of trigger firing is counted. Also, the snapshot just before the > > > > table modification is stored. > > > > > > > > In AFTER triggers, each transition tables are preserved. The number > > > > of trigger firing is counted also here, and when the firing number of > > > > BEFORE and AFTER trigger reach the same, it is deemed the final AFTER > > > > trigger call. > > > > > > > > In the final AFTER trigger, the IMMV is maintained. Rewritten view > > > > query is executed to generate delta tables, and deltas are applied > > > > to the view. If multiple tables are modified simultaneously, this > > > > process is iterated for each modified table. Tables before processed > > > > are represented in "pre-update-state", processed tables are > > > > "post-update-state" in the rewritten query. > > > > > > > > > - 0007: Add DISTINCT support for IVM > > > > > > > > This patch adds DISTINCT clause support. > > > > > > > > When an IMMV including DISTINCT is created, a hidden column > > > > "__ivm_count__" is added to the target list. This column has the > > > > number of duplicity of the same tuples. The duplicity is calculated > > > > by adding "count(*)" and GROUP BY to the view query. > > > > > > > > When an IMMV is maintained, the duplicity in __ivm_count__ is updated, > > > > and a tuples whose duplicity becomes zero can be deleted from the view. > > > > This logic is implemented by SQL in apply_old_delta_with_count and > > > > apply_new_delta_with_count. > > > > > > > > Columns starting with "__ivm_" are deemed hidden columns that doesn't > > > > appear when a view is accessed by "SELECT * FROM ....". This is > > > > implemented by fixing parse_relation.c. > > > > > > > > > - 0008: Add aggregates support in IVM > > > > > > > > This patch provides codes for aggregates support, specifically > > > > for builtin count, sum, and avg. > > > > > > > > When an IMMV containing an aggregate is created, it is checked if this > > > > aggregate function is supported, and if it is ok, some hidden columns > > > > are added to the target list. > > > > > > > > When the IMMV is maintained, the aggregated value is updated as well as > > > > related hidden columns. The way of update depends the type of aggregate > > > > functions, and SET clause string is generated for each aggregate. > > > > > > > > > - 0009: Add support for min/max aggregates for IVM > > > > > > > > This patch adds min/max aggregates support. > > > > > > > > This is separated from #0008 because min/max needs more complicated > > > > work than count, sum, and avg. > > > > > > > > If the view has min(x) or max(x) and the minimum or maximal value is > > > > deleted from a table, we need to update the value to the new min/max > > > > recalculated from the tables rather than incremental computation. > > > > This is performed in recalc_and_set_values(). > > > > > > > > TIDs and keys of tuples that need re-calculation are returned as a > > > > result of the query that deleted min/max values from the view using > > > > RETURNING clause. The plan to recalculate and set the new min/max value > > > > are stored and reused. > > > > > > > > > - 0010: regression tests > > > > > > > > This patch provides regression tests for IVM. > > > > > > > > > - 0011: documentation > > > > > > > > This patch provides documantation for IVM. > > > > > > > > --------------------------------------------------------------------------------------- > > > > * Changes from the Previous Version (v27) > > > > > > > > - Allow TRUNCATE on base tables > > > > > > > > When a base table is truncated, the view content will be empty if the > > > > view definition query does not contain an aggregate without a GROUP clause. > > > > Therefore, such views can be truncated. > > > > > > > > Aggregate views without a GROUP clause always have one row. Therefore, > > > > if a base table is truncated, the view will not be empty and will contain > > > > a row with NULL value (or 0 for count()). So, in this case, we refresh the > > > > view instead of truncating it. > > > > > > > > - Fix bugs reported by huyajun [1] > > > > > > > > [1] https://www.postgresql.org/message-id/tencent_FCAF11BCA5003FD16BDDFDDA5D6A19587809%40qq.com > > > > > > > > --------------------------------------------------------------------------------------- > > > > * Discussion > > > > > > > > ** Aggregate support > > > > > > > > There were a few suggestions that general aggregate functions should be > > > > supported [2][3], which may be possible by extending pg_aggregate catalog. > > > > However, we decided to leave supporting general aggregates to the future work [4] > > > > because it would need substantial works and make the patch more complex and > > > > bigger. > > > > > > > > There has been no opposite opinion on this. However, if we need more discussion > > > > on the design of aggregate support, we can omit aggregate support for the first > > > > release of IVM. > > > > > > > > [2] https://www.postgresql.org/message-id/20191128140333.GA25947%40alvherre.pgsql > > > > [3] https://www.postgresql.org/message-id/CAM-w4HOvDrL4ou6m%3D592zUiKGVzTcOpNj-d_cJqzL00fdsS5kg%40mail.gmail.com > > > > [4] https://www.postgresql.org/message-id/20201016193034.9a4c44c79fc1eca7babe093e%40sraoss.co.jp > > > > > > > > ** Hidden columns > > > > > > > > In order to support DISTINCT or aggregates, our implementation uses hidden columns. > > > > > > > > Columns starting with "__ivm_" are hidden columns that doesn't appear when a > > > > view is accessed by "SELECT * FROM ....". For this aim, parse_relation.c is > > > > fixed. There was a proposal to enable hidden columns by adding a new flag to > > > > pg_attribute [5], but this thread is no longer active, so we decided to check > > > > the hidden column by its name [6]. > > > > > > > > [5] https://www.postgresql.org/message-id/flat/CAEepm%3D3ZHh%3Dp0nEEnVbs1Dig_UShPzHUcMNAqvDQUgYgcDo-pA%40mail.gmail.com > > > > [6] https://www.postgresql.org/message-id/20201016193034.9a4c44c79fc1eca7babe093e%40sraoss.co.jp > > > > > > > > ** Concurrent Transactions > > > > > > > > When the view definition has more than one table, we acquire an exclusive > > > > lock before the view maintenance in order to avoid inconsistent results. > > > > This behavior was explained in [7]. The lock was improved to use weaker lock > > > > when the view has only one table based on a suggestion from Konstantin Knizhnik [8]. > > > > However, due to the implementation that uses ctid for identifying target tuples, > > > > we still have to use an exclusive lock for DELETE and UPDATE. > > > > > > > > [7] https://www.postgresql.org/message-id/20200909092752.c91758a1bec3479668e82643%40sraoss.co.jp > > > > [8] https://www.postgresql.org/message-id/5663f5f0-48af-686c-bf3c-62d279567e2a%40postgrespro.ru > > > > > > > > ** Automatic Index Creation > > > > > > > > When a view is created, a unique index is automatically created if > > > > possible, that is, if the view definition query has a GROUP BY or > > > > DISTINCT, or if the view contains all primary key attributes of > > > > its base tables in the target list. It is necessary for efficient > > > > view maintenance. This feature is based on a suggestion from > > > > Konstantin Knizhnik [9]. > > > > > > > > [9] https://www.postgresql.org/message-id/89729da8-9042-7ea0-95af-e415df6da14d%40postgrespro.ru > > > > > > > > > > > > ** Trigger and Transition Tables > > > > > > > > We implemented IVM based on triggers. This is because we want to use > > > > transition tables to extract changes on base tables. Also, there are > > > > other constraint that are using triggers in its implementation, like > > > > foreign references. However, if we can use transition table like feature > > > > without relying triggers, we don't have to insist to use triggers and we > > > > might implement IVM in the executor directly as similar as declarative > > > > partitioning. > > > > > > > > ** Feature to be Supported in the First Release > > > > > > > > The current patch-set supports DISTINCT and aggregates for built-in count, > > > > sum, avg, min and max. Do we need all these feature for the first IVM release? > > > > Supporting DISTINCT and aggregates needs discussion on hidden columns, and > > > > for supporting min/max we need to discuss on re-calculation method. Before > > > > handling such relatively advanced feature, maybe, should we focus to design > > > > and implement of the basic feature of IVM? > > > > > > > > > > > > Any suggestion and discussion are welcomed! > > > > > > > > Regards, > > > > Yugo Nagata > > > > > > > > -- > > > > Yugo NAGATA <nagata@sraoss.co.jp> > > > > > > > > > > > > > > > > > > The followings are supported in view definition queries: > > > > - SELECT ... FROM ... WHERE ..., joins (inner joins, self-joins) > > > > > > > > > > Also, a view definition query cannot contain other views, materialized views, > > > > foreign tables, partitioned tables, partitions, VALUES, non-immutable functions, > > > > system columns, or expressions that contains aggregates. > > > > > > Does this also apply to tableoid? but tableoid is a constant, so it > > > should be fine? > > > can following two queries apply to this feature. > > > select tableoid, unique1 from tenk1; > > > > Currently, this is not allowed because tableoid is a system column. > > As you say, tableoid is a constant, so we can allow. Should we do this? > > > > > select 1 as constant, unique1 from tenk1; > > > > This is allowed, of course. > > > > > I didn't apply the patch.(will do later, for someone to test, it would > > > be a better idea to dump a whole file separately....). > > > > Thank you! I'm looking forward to your feedback. > > (I didn't attach a whole patch separately because I wouldn't like > > cfbot to be unhappy...) > > > > Regards, > > Yugo Nagata > > > > -- > > Yugo NAGATA <nagata@sraoss.co.jp> > > I played around first half of regress patch. > these all following queries fails. > > CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_rename AS > SELECT DISTINCT * , 1 as "__ivm_count__" FROM mv_base_a; > > CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_rename AS > SELECT DISTINCT * , 1 as "__ivm_countblablabla" FROM mv_base_a; > > CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_rename AS > SELECT DISTINCT * , 1 as "__ivm_count" FROM mv_base_a; > > CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_rename AS > SELECT DISTINCT * , 1 as "__ivm_count_____" FROM mv_base_a; > > CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_rename AS > SELECT DISTINCT * , 1 as "__ivm_countblabla" FROM mv_base_a; > > so the hidden column reserved pattern "__ivm_count.*"? that would be a lot.... > > select * from pg_matviews where matviewname = 'mv_ivm_1'; > don't have relisivm option. it's reasonable to make it in view pg_matviews? another trivial: incremental_matview.out (last few lines) last transaction seems to need COMMIT command.
I cannot build the doc. git clean -fdx git am ~/Desktop/tmp/*.patch Applying: Add a syntax to create Incrementally Maintainable Materialized Views Applying: Add relisivm column to pg_class system catalog Applying: Allow to prolong life span of transition tables until transaction end Applying: Add Incremental View Maintenance support to pg_dump Applying: Add Incremental View Maintenance support to psql Applying: Add Incremental View Maintenance support Applying: Add DISTINCT support for IVM Applying: Add aggregates support in IVM Applying: Add support for min/max aggregates for IVM Applying: Add regression tests for Incremental View Maintenance Applying: Add documentations about Incremental View Maintenance .git/rebase-apply/patch:79: trailing whitespace. clause. warning: 1 line adds whitespace errors. Because of this, the {ninja docs} command failed. ERROR message: [6/6] Generating doc/src/sgml/html with a custom command FAILED: doc/src/sgml/html /usr/bin/python3 ../../Desktop/pg_sources/main/postgres/doc/src/sgml/xmltools_dep_wrapper --targetname doc/src/sgml/html --depfile doc/src/sgml/html.d --tool /usr/bin/xsltproc -- -o doc/src/sgml/ --nonet --stringparam pg.version 16beta2 --path doc/src/sgml --path ../../Desktop/pg_sources/main/postgres/doc/src/sgml ../../Desktop/pg_sources/main/postgres/doc/src/sgml/stylesheet.xsl doc/src/sgml/postgres-full.xml ERROR: id attribute missing on <sect2> element under /book[@id = 'postgres']/part[@id = 'server-programming']/chapter[@id = 'rules']/sect1[@id = 'rules-ivm'] error: file doc/src/sgml/postgres-full.xml xsltRunStylesheet : run failed ninja: build stopped: subcommand failed.
On Thu, Jun 29, 2023 at 6:51 PM jian he <jian.universality@gmail.com> wrote: > > I cannot build the doc. > git clean -fdx > git am ~/Desktop/tmp/*.patch > > Applying: Add a syntax to create Incrementally Maintainable Materialized Views > Applying: Add relisivm column to pg_class system catalog > Applying: Allow to prolong life span of transition tables until transaction end > Applying: Add Incremental View Maintenance support to pg_dump > Applying: Add Incremental View Maintenance support to psql > Applying: Add Incremental View Maintenance support > Applying: Add DISTINCT support for IVM > Applying: Add aggregates support in IVM > Applying: Add support for min/max aggregates for IVM > Applying: Add regression tests for Incremental View Maintenance > Applying: Add documentations about Incremental View Maintenance > .git/rebase-apply/patch:79: trailing whitespace. > clause. > warning: 1 line adds whitespace errors. > > Because of this, the {ninja docs} command failed. ERROR message: > > [6/6] Generating doc/src/sgml/html with a custom command > FAILED: doc/src/sgml/html > /usr/bin/python3 > ../../Desktop/pg_sources/main/postgres/doc/src/sgml/xmltools_dep_wrapper > --targetname doc/src/sgml/html --depfile doc/src/sgml/html.d --tool > /usr/bin/xsltproc -- -o doc/src/sgml/ --nonet --stringparam pg.version > 16beta2 --path doc/src/sgml --path > ../../Desktop/pg_sources/main/postgres/doc/src/sgml > ../../Desktop/pg_sources/main/postgres/doc/src/sgml/stylesheet.xsl > doc/src/sgml/postgres-full.xml > ERROR: id attribute missing on <sect2> element under /book[@id = > 'postgres']/part[@id = 'server-programming']/chapter[@id = > 'rules']/sect1[@id = 'rules-ivm'] > error: file doc/src/sgml/postgres-full.xml > xsltRunStylesheet : run failed > ninja: build stopped: subcommand failed. so far what I tried: git am --ignore-whitespace --whitespace=nowarn ~/Desktop/tmp/*.patch git am --whitespace=fix ~/Desktop/tmp/*.patch git am --whitespace=error ~/Desktop/tmp/*.patch I still cannot generate html docs.
Hi there. in v28-0005-Add-Incremental-View-Maintenance-support-to-psql.patch I don't know how to set psql to get the output "Incremental view maintenance: yes"
This is probably not trivial.
In function apply_new_delta_with_count.
appendStringInfo(&querybuf,
"WITH updt AS (" /* update a tuple if this exists in the view */
"UPDATE %s AS mv SET %s = mv.%s OPERATOR(pg_catalog.+) diff.%s "
"%s " /* SET clauses for aggregates */
"FROM %s AS diff "
"WHERE %s " /* tuple matching condition */
"RETURNING %s" /* returning keys of updated tuples */
") INSERT INTO %s (%s)" /* insert a new tuple if this doesn't existw */
"SELECT %s FROM %s AS diff "
"WHERE NOT EXISTS (SELECT 1 FROM updt AS mv WHERE %s);",
"WITH updt AS (" /* update a tuple if this exists in the view */
"UPDATE %s AS mv SET %s = mv.%s OPERATOR(pg_catalog.+) diff.%s "
"%s " /* SET clauses for aggregates */
"FROM %s AS diff "
"WHERE %s " /* tuple matching condition */
"RETURNING %s" /* returning keys of updated tuples */
") INSERT INTO %s (%s)" /* insert a new tuple if this doesn't existw */
"SELECT %s FROM %s AS diff "
"WHERE NOT EXISTS (SELECT 1 FROM updt AS mv WHERE %s);",
---------------------
") INSERT INTO %s (%s)" /* insert a new tuple if this doesn't existw */
"SELECT %s FROM %s AS diff "
"SELECT %s FROM %s AS diff "
the INSERT INTO line, should have one white space in the end?
also "existw" should be "exists"
ok. Now I really found a small bug. this works as intended: BEGIN; CREATE INCREMENTAL MATERIALIZED VIEW test_ivm AS SELECT i, MIN(j) as min_j FROM mv_base_a group by 1; INSERT INTO mv_base_a select 1,-2 where false; rollback; however the following one: BEGIN; CREATE INCREMENTAL MATERIALIZED VIEW test_ivm1 AS SELECT MIN(j) as min_j FROM mv_base_a; INSERT INTO mv_base_a select 1, -2 where false; rollback; will evaluate tuplestore_tuple_count(new_tuplestores) to 1, it will walk through IVM_immediate_maintenance function to apply_delta. but should it be zero?
On Thu, 29 Jun 2023 00:40:45 +0800 jian he <jian.universality@gmail.com> wrote: > On Wed, Jun 28, 2023 at 4:06 PM Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > > > On Wed, 28 Jun 2023 00:01:02 +0800 > > jian he <jian.universality@gmail.com> wrote: > > > > > On Thu, Jun 1, 2023 at 2:47 AM Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > > > > > > > On Thu, 1 Jun 2023 23:59:09 +0900 > > > > Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > > > > > > > > Hello hackers, > > > > > > > > > > Here's a rebased version of the patch-set adding Incremental View > > > > > Maintenance support for PostgreSQL. That was discussed in [1]. > > > > > > > > > [1] https://www.postgresql.org/message-id/flat/20181227215726.4d166b4874f8983a641123f5%40sraoss.co.jp > > > > > > > > --------------------------------------------------------------------------------------- > > > > * Overview > > > > > > > > Incremental View Maintenance (IVM) is a way to make materialized views > > > > up-to-date by computing only incremental changes and applying them on > > > > views. IVM is more efficient than REFRESH MATERIALIZED VIEW when > > > > only small parts of the view are changed. > > > > > > > > ** Feature > > > > > > > > The attached patchset provides a feature that allows materialized views > > > > to be updated automatically and incrementally just after a underlying > > > > table is modified. > > > > > > > > You can create an incementally maintainable materialized view (IMMV) > > > > by using CREATE INCREMENTAL MATERIALIZED VIEW command. > > > > > > > > The followings are supported in view definition queries: > > > > - SELECT ... FROM ... WHERE ..., joins (inner joins, self-joins) > > > > - some built-in aggregate functions (count, sum, avg, min, max) > > > > - GROUP BY clause > > > > - DISTINCT clause > > > > > > > > Views can contain multiple tuples with the same content (duplicate tuples). > > > > > > > > ** Restriction > > > > > > > > The following are not supported in a view definition: > > > > - Outer joins > > > > - Aggregates otehr than above, window functions, HAVING > > > > - Sub-queries, CTEs > > > > - Set operations (UNION, INTERSECT, EXCEPT) > > > > - DISTINCT ON, ORDER BY, LIMIT, OFFSET > > > > > > > > Also, a view definition query cannot contain other views, materialized views, > > > > foreign tables, partitioned tables, partitions, VALUES, non-immutable functions, > > > > system columns, or expressions that contains aggregates. > > > > > > > > --------------------------------------------------------------------------------------- > > > > * Design > > > > > > > > An IMMV is maintained using statement-level AFTER triggers. > > > > When an IMMV is created, triggers are automatically created on all base > > > > tables contained in the view definition query. > > > > > > > > When a table is modified, changes that occurred in the table are extracted > > > > as transition tables in the AFTER triggers. Then, changes that will occur in > > > > the view are calculated by a rewritten view dequery in which the modified table > > > > is replaced with the transition table. > > > > > > > > For example, if the view is defined as "SELECT * FROM R, S", and tuples inserted > > > > into R are stored in a transiton table dR, the tuples that will be inserted into > > > > the view are calculated as the result of "SELECT * FROM dR, S". > > > > > > > > ** Multiple Tables Modification > > > > > > > > Multiple tables can be modified in a statement when using triggers, foreign key > > > > constraint, or modifying CTEs. When multiple tables are modified, we need > > > > the state of tables before the modification. > > > > > > > > For example, when some tuples, dR and dS, are inserted into R and S respectively, > > > > the tuples that will be inserted into the view are calculated by the following > > > > two queries: > > > > > > > > "SELECT * FROM dR, S_pre" > > > > "SELECT * FROM R, dS" > > > > > > > > where S_pre is the table before the modification, R is the current state of > > > > table, that is, after the modification. This pre-update states of table > > > > is calculated by filtering inserted tuples and appending deleted tuples. > > > > The subquery that represents pre-update state is generated in get_prestate_rte(). > > > > Specifically, the insterted tuples are filtered by calling IVM_visible_in_prestate() > > > > in WHERE clause. This function checks the visibility of tuples by using > > > > the snapshot taken before table modification. The deleted tuples are contained > > > > in the old transition table, and this table is appended using UNION ALL. > > > > > > > > Transition tables for each modification are collected in each AFTER trigger > > > > function call. Then, the view maintenance is performed in the last call of > > > > the trigger. > > > > > > > > In the original PostgreSQL, tuplestores of transition tables are freed at the > > > > end of each nested query. However, their lifespan needs to be prolonged to > > > > the end of the out-most query in order to maintain the view in the last AFTER > > > > trigger. For this purpose, SetTransitionTablePreserved is added in trigger.c. > > > > > > > > ** Duplicate Tulpes > > > > > > > > When calculating changes that will occur in the view (= delta tables), > > > > multiplicity of tuples are calculated by using count(*). > > > > > > > > When deleting tuples from the view, tuples to be deleted are identified by > > > > joining the delta table with the view, and tuples are deleted as many as > > > > specified multiplicity by numbered using row_number() function. > > > > This is implemented in apply_old_delta(). > > > > > > > > When inserting tuples into the view, each tuple is duplicated to the > > > > specified multiplicity using generate_series() function. This is implemented > > > > in apply_new_delta(). > > > > > > > > ** DISTINCT clause > > > > > > > > When DISTINCT is used, the view has a hidden column __ivm_count__ that > > > > stores multiplicity for tuples. When tuples are deleted from or inserted into > > > > the view, the values of __ivm_count__ column is decreased or increased as many > > > > as specified multiplicity. Eventually, when the values becomes zero, the > > > > corresponding tuple is deleted from the view. This is implemented in > > > > apply_old_delta_with_count() and apply_new_delta_with_count(). > > > > > > > > ** Aggregates > > > > > > > > Built-in count sum, avg, min, and max are supported. Whether a given > > > > aggregate function can be used or not is checked by using its OID in > > > > check_aggregate_supports_ivm(). > > > > > > > > When creating a materialized view containing aggregates, in addition > > > > to __ivm_count__, more than one hidden columns for each aggregate are > > > > added to the target list. For example, columns for storing sum(x), > > > > count(x) are added if we have avg(x). When the view is maintained, > > > > aggregated values are updated using these hidden columns, also hidden > > > > columns are updated at the same time. > > > > > > > > The maintenance of aggregated view is performed in > > > > apply_old_delta_with_count() and apply_new_delta_with_count(). The SET > > > > clauses for updating columns are generated by append_set_clause_*(). > > > > > > > > If the view has min(x) or max(x) and the minimum or maximal value is > > > > deleted from a table, we need to update the value to the new min/max > > > > recalculated from the tables rather than incremental computation. This > > > > is performed in recalc_and_set_values(). > > > > > > > > --------------------------------------------------------------------------------------- > > > > * Details of the patch-set (v28) > > > > > > > > > The patch-set consists of the following eleven patches. > > > > > > > > In the previous version, the number of patches were nine. > > > > In the latest patch-set, the patches are divided more finely > > > > aiming to make the review easier. > > > > > > > > > - 0001: Add a syntax to create Incrementally Maintainable Materialized Views > > > > > > > > The prposed syntax to create an incrementally maintainable materialized > > > > view (IMMV) is; > > > > > > > > CREATE INCREMENTAL MATERIALIZED VIEW AS SELECT .....; > > > > > > > > However, this syntax is tentative, so any suggestions are welcomed. > > > > > > > > > - 0002: Add relisivm column to pg_class system catalog > > > > > > > > We add a new field in pg_class to indicate a relation is IMMV. > > > > Another alternative is to add a new catalog for managing materialized > > > > views including IMMV, but I am not sure if we want this. > > > > > > > > > - 0003: Allow to prolong life span of transition tables until transaction end > > > > > > > > This patch fixes the trigger system to allow to prolong lifespan of > > > > tuple stores for transition tables until the transaction end. We need > > > > this because multiple transition tables have to be preserved until the > > > > end of the out-most query when multiple tables are modified by nested > > > > triggers. (as explained above in Design - Multiple Tables Modification) > > > > > > > > If we don't want to change the trigger system in such way, the alternative > > > > is to copy the contents of transition tables to other tuplestores, although > > > > it needs more time and memory. > > > > > > > > > - 0004: Add Incremental View Maintenance support to pg_dump > > > > > > > > This patch enables pg_dump to output IMMV using the new syntax. > > > > > > > > > - 0005: Add Incremental View Maintenance support to psql > > > > > > > > This patch implements tab-completion for the new syntax and adds > > > > information of IMMV to \d meta-command results. > > > > > > > > > - 0006: Add Incremental View Maintenance support > > > > > > > > This patch implements the basic IVM feature. > > > > DISTINCT and aggregate are not supported here. > > > > > > > > When an IMMV is created, the view query is checked, and if any > > > > non-supported feature is used, it raises an error. If it is ok, > > > > triggers are created on base tables and an unique index is > > > > created on the view if possible. > > > > > > > > In BEFORE trigger, an entry is created for each IMMV and the number > > > > of trigger firing is counted. Also, the snapshot just before the > > > > table modification is stored. > > > > > > > > In AFTER triggers, each transition tables are preserved. The number > > > > of trigger firing is counted also here, and when the firing number of > > > > BEFORE and AFTER trigger reach the same, it is deemed the final AFTER > > > > trigger call. > > > > > > > > In the final AFTER trigger, the IMMV is maintained. Rewritten view > > > > query is executed to generate delta tables, and deltas are applied > > > > to the view. If multiple tables are modified simultaneously, this > > > > process is iterated for each modified table. Tables before processed > > > > are represented in "pre-update-state", processed tables are > > > > "post-update-state" in the rewritten query. > > > > > > > > > - 0007: Add DISTINCT support for IVM > > > > > > > > This patch adds DISTINCT clause support. > > > > > > > > When an IMMV including DISTINCT is created, a hidden column > > > > "__ivm_count__" is added to the target list. This column has the > > > > number of duplicity of the same tuples. The duplicity is calculated > > > > by adding "count(*)" and GROUP BY to the view query. > > > > > > > > When an IMMV is maintained, the duplicity in __ivm_count__ is updated, > > > > and a tuples whose duplicity becomes zero can be deleted from the view. > > > > This logic is implemented by SQL in apply_old_delta_with_count and > > > > apply_new_delta_with_count. > > > > > > > > Columns starting with "__ivm_" are deemed hidden columns that doesn't > > > > appear when a view is accessed by "SELECT * FROM ....". This is > > > > implemented by fixing parse_relation.c. > > > > > > > > > - 0008: Add aggregates support in IVM > > > > > > > > This patch provides codes for aggregates support, specifically > > > > for builtin count, sum, and avg. > > > > > > > > When an IMMV containing an aggregate is created, it is checked if this > > > > aggregate function is supported, and if it is ok, some hidden columns > > > > are added to the target list. > > > > > > > > When the IMMV is maintained, the aggregated value is updated as well as > > > > related hidden columns. The way of update depends the type of aggregate > > > > functions, and SET clause string is generated for each aggregate. > > > > > > > > > - 0009: Add support for min/max aggregates for IVM > > > > > > > > This patch adds min/max aggregates support. > > > > > > > > This is separated from #0008 because min/max needs more complicated > > > > work than count, sum, and avg. > > > > > > > > If the view has min(x) or max(x) and the minimum or maximal value is > > > > deleted from a table, we need to update the value to the new min/max > > > > recalculated from the tables rather than incremental computation. > > > > This is performed in recalc_and_set_values(). > > > > > > > > TIDs and keys of tuples that need re-calculation are returned as a > > > > result of the query that deleted min/max values from the view using > > > > RETURNING clause. The plan to recalculate and set the new min/max value > > > > are stored and reused. > > > > > > > > > - 0010: regression tests > > > > > > > > This patch provides regression tests for IVM. > > > > > > > > > - 0011: documentation > > > > > > > > This patch provides documantation for IVM. > > > > > > > > --------------------------------------------------------------------------------------- > > > > * Changes from the Previous Version (v27) > > > > > > > > - Allow TRUNCATE on base tables > > > > > > > > When a base table is truncated, the view content will be empty if the > > > > view definition query does not contain an aggregate without a GROUP clause. > > > > Therefore, such views can be truncated. > > > > > > > > Aggregate views without a GROUP clause always have one row. Therefore, > > > > if a base table is truncated, the view will not be empty and will contain > > > > a row with NULL value (or 0 for count()). So, in this case, we refresh the > > > > view instead of truncating it. > > > > > > > > - Fix bugs reported by huyajun [1] > > > > > > > > [1] https://www.postgresql.org/message-id/tencent_FCAF11BCA5003FD16BDDFDDA5D6A19587809%40qq.com > > > > > > > > --------------------------------------------------------------------------------------- > > > > * Discussion > > > > > > > > ** Aggregate support > > > > > > > > There were a few suggestions that general aggregate functions should be > > > > supported [2][3], which may be possible by extending pg_aggregate catalog. > > > > However, we decided to leave supporting general aggregates to the future work [4] > > > > because it would need substantial works and make the patch more complex and > > > > bigger. > > > > > > > > There has been no opposite opinion on this. However, if we need more discussion > > > > on the design of aggregate support, we can omit aggregate support for the first > > > > release of IVM. > > > > > > > > [2] https://www.postgresql.org/message-id/20191128140333.GA25947%40alvherre.pgsql > > > > [3] https://www.postgresql.org/message-id/CAM-w4HOvDrL4ou6m%3D592zUiKGVzTcOpNj-d_cJqzL00fdsS5kg%40mail.gmail.com > > > > [4] https://www.postgresql.org/message-id/20201016193034.9a4c44c79fc1eca7babe093e%40sraoss.co.jp > > > > > > > > ** Hidden columns > > > > > > > > In order to support DISTINCT or aggregates, our implementation uses hidden columns. > > > > > > > > Columns starting with "__ivm_" are hidden columns that doesn't appear when a > > > > view is accessed by "SELECT * FROM ....". For this aim, parse_relation.c is > > > > fixed. There was a proposal to enable hidden columns by adding a new flag to > > > > pg_attribute [5], but this thread is no longer active, so we decided to check > > > > the hidden column by its name [6]. > > > > > > > > [5] https://www.postgresql.org/message-id/flat/CAEepm%3D3ZHh%3Dp0nEEnVbs1Dig_UShPzHUcMNAqvDQUgYgcDo-pA%40mail.gmail.com > > > > [6] https://www.postgresql.org/message-id/20201016193034.9a4c44c79fc1eca7babe093e%40sraoss.co.jp > > > > > > > > ** Concurrent Transactions > > > > > > > > When the view definition has more than one table, we acquire an exclusive > > > > lock before the view maintenance in order to avoid inconsistent results. > > > > This behavior was explained in [7]. The lock was improved to use weaker lock > > > > when the view has only one table based on a suggestion from Konstantin Knizhnik [8]. > > > > However, due to the implementation that uses ctid for identifying target tuples, > > > > we still have to use an exclusive lock for DELETE and UPDATE. > > > > > > > > [7] https://www.postgresql.org/message-id/20200909092752.c91758a1bec3479668e82643%40sraoss.co.jp > > > > [8] https://www.postgresql.org/message-id/5663f5f0-48af-686c-bf3c-62d279567e2a%40postgrespro.ru > > > > > > > > ** Automatic Index Creation > > > > > > > > When a view is created, a unique index is automatically created if > > > > possible, that is, if the view definition query has a GROUP BY or > > > > DISTINCT, or if the view contains all primary key attributes of > > > > its base tables in the target list. It is necessary for efficient > > > > view maintenance. This feature is based on a suggestion from > > > > Konstantin Knizhnik [9]. > > > > > > > > [9] https://www.postgresql.org/message-id/89729da8-9042-7ea0-95af-e415df6da14d%40postgrespro.ru > > > > > > > > > > > > ** Trigger and Transition Tables > > > > > > > > We implemented IVM based on triggers. This is because we want to use > > > > transition tables to extract changes on base tables. Also, there are > > > > other constraint that are using triggers in its implementation, like > > > > foreign references. However, if we can use transition table like feature > > > > without relying triggers, we don't have to insist to use triggers and we > > > > might implement IVM in the executor directly as similar as declarative > > > > partitioning. > > > > > > > > ** Feature to be Supported in the First Release > > > > > > > > The current patch-set supports DISTINCT and aggregates for built-in count, > > > > sum, avg, min and max. Do we need all these feature for the first IVM release? > > > > Supporting DISTINCT and aggregates needs discussion on hidden columns, and > > > > for supporting min/max we need to discuss on re-calculation method. Before > > > > handling such relatively advanced feature, maybe, should we focus to design > > > > and implement of the basic feature of IVM? > > > > > > > > > > > > Any suggestion and discussion are welcomed! > > > > > > > > Regards, > > > > Yugo Nagata > > > > > > > > -- > > > > Yugo NAGATA <nagata@sraoss.co.jp> > > > > > > > > > > > > > > > > > > The followings are supported in view definition queries: > > > > - SELECT ... FROM ... WHERE ..., joins (inner joins, self-joins) > > > > > > > > > > Also, a view definition query cannot contain other views, materialized views, > > > > foreign tables, partitioned tables, partitions, VALUES, non-immutable functions, > > > > system columns, or expressions that contains aggregates. > > > > > > Does this also apply to tableoid? but tableoid is a constant, so it > > > should be fine? > > > can following two queries apply to this feature. > > > select tableoid, unique1 from tenk1; > > > > Currently, this is not allowed because tableoid is a system column. > > As you say, tableoid is a constant, so we can allow. Should we do this? > > > > > select 1 as constant, unique1 from tenk1; > > > > This is allowed, of course. > > > > > I didn't apply the patch.(will do later, for someone to test, it would > > > be a better idea to dump a whole file separately....). > > > > Thank you! I'm looking forward to your feedback. > > (I didn't attach a whole patch separately because I wouldn't like > > cfbot to be unhappy...) > > > > Regards, > > Yugo Nagata > > > > -- > > Yugo NAGATA <nagata@sraoss.co.jp> > > I played around first half of regress patch. I'm so sorry for the late reply. > these all following queries fails. > > CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_rename AS > SELECT DISTINCT * , 1 as "__ivm_count__" FROM mv_base_a; > > CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_rename AS > SELECT DISTINCT * , 1 as "__ivm_countblablabla" FROM mv_base_a; > > CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_rename AS > SELECT DISTINCT * , 1 as "__ivm_count" FROM mv_base_a; > > CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_rename AS > SELECT DISTINCT * , 1 as "__ivm_count_____" FROM mv_base_a; > > CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_rename AS > SELECT DISTINCT * , 1 as "__ivm_countblabla" FROM mv_base_a; > > so the hidden column reserved pattern "__ivm_count.*"? that would be a lot.... Column names which start with "__ivm_" are prohibited because hidden columns using this pattern are used for handling views with aggregate or DISTINCT. Even when neither aggregate or DISINCT is used, such column name is used for handling tuple duplicates in views. So, if we choose not to allow tuple duplicates in the initial version of IVM, we would remove this restriction for now.... > > select * from pg_matviews where matviewname = 'mv_ivm_1'; > don't have relisivm option. it's reasonable to make it in view pg_matviews? Make sense. I'll do it. Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
On Thu, 29 Jun 2023 18:20:32 +0800 jian he <jian.universality@gmail.com> wrote: > On Thu, Jun 29, 2023 at 12:40 AM jian he <jian.universality@gmail.com> wrote: > > > > On Wed, Jun 28, 2023 at 4:06 PM Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > > > > > On Wed, 28 Jun 2023 00:01:02 +0800 > > > jian he <jian.universality@gmail.com> wrote: > > > > > > > On Thu, Jun 1, 2023 at 2:47 AM Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > > > > > > > > > On Thu, 1 Jun 2023 23:59:09 +0900 > > > > > Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > > > > > > > > > > Hello hackers, > > > > > > > > > > > > Here's a rebased version of the patch-set adding Incremental View > > > > > > Maintenance support for PostgreSQL. That was discussed in [1]. > > > > > > > > > > > [1] https://www.postgresql.org/message-id/flat/20181227215726.4d166b4874f8983a641123f5%40sraoss.co.jp > > > > > > > > > > --------------------------------------------------------------------------------------- > > > > > * Overview > > > > > > > > > > Incremental View Maintenance (IVM) is a way to make materialized views > > > > > up-to-date by computing only incremental changes and applying them on > > > > > views. IVM is more efficient than REFRESH MATERIALIZED VIEW when > > > > > only small parts of the view are changed. > > > > > > > > > > ** Feature > > > > > > > > > > The attached patchset provides a feature that allows materialized views > > > > > to be updated automatically and incrementally just after a underlying > > > > > table is modified. > > > > > > > > > > You can create an incementally maintainable materialized view (IMMV) > > > > > by using CREATE INCREMENTAL MATERIALIZED VIEW command. > > > > > > > > > > The followings are supported in view definition queries: > > > > > - SELECT ... FROM ... WHERE ..., joins (inner joins, self-joins) > > > > > - some built-in aggregate functions (count, sum, avg, min, max) > > > > > - GROUP BY clause > > > > > - DISTINCT clause > > > > > > > > > > Views can contain multiple tuples with the same content (duplicate tuples). > > > > > > > > > > ** Restriction > > > > > > > > > > The following are not supported in a view definition: > > > > > - Outer joins > > > > > - Aggregates otehr than above, window functions, HAVING > > > > > - Sub-queries, CTEs > > > > > - Set operations (UNION, INTERSECT, EXCEPT) > > > > > - DISTINCT ON, ORDER BY, LIMIT, OFFSET > > > > > > > > > > Also, a view definition query cannot contain other views, materialized views, > > > > > foreign tables, partitioned tables, partitions, VALUES, non-immutable functions, > > > > > system columns, or expressions that contains aggregates. > > > > > > > > > > --------------------------------------------------------------------------------------- > > > > > * Design > > > > > > > > > > An IMMV is maintained using statement-level AFTER triggers. > > > > > When an IMMV is created, triggers are automatically created on all base > > > > > tables contained in the view definition query. > > > > > > > > > > When a table is modified, changes that occurred in the table are extracted > > > > > as transition tables in the AFTER triggers. Then, changes that will occur in > > > > > the view are calculated by a rewritten view dequery in which the modified table > > > > > is replaced with the transition table. > > > > > > > > > > For example, if the view is defined as "SELECT * FROM R, S", and tuples inserted > > > > > into R are stored in a transiton table dR, the tuples that will be inserted into > > > > > the view are calculated as the result of "SELECT * FROM dR, S". > > > > > > > > > > ** Multiple Tables Modification > > > > > > > > > > Multiple tables can be modified in a statement when using triggers, foreign key > > > > > constraint, or modifying CTEs. When multiple tables are modified, we need > > > > > the state of tables before the modification. > > > > > > > > > > For example, when some tuples, dR and dS, are inserted into R and S respectively, > > > > > the tuples that will be inserted into the view are calculated by the following > > > > > two queries: > > > > > > > > > > "SELECT * FROM dR, S_pre" > > > > > "SELECT * FROM R, dS" > > > > > > > > > > where S_pre is the table before the modification, R is the current state of > > > > > table, that is, after the modification. This pre-update states of table > > > > > is calculated by filtering inserted tuples and appending deleted tuples. > > > > > The subquery that represents pre-update state is generated in get_prestate_rte(). > > > > > Specifically, the insterted tuples are filtered by calling IVM_visible_in_prestate() > > > > > in WHERE clause. This function checks the visibility of tuples by using > > > > > the snapshot taken before table modification. The deleted tuples are contained > > > > > in the old transition table, and this table is appended using UNION ALL. > > > > > > > > > > Transition tables for each modification are collected in each AFTER trigger > > > > > function call. Then, the view maintenance is performed in the last call of > > > > > the trigger. > > > > > > > > > > In the original PostgreSQL, tuplestores of transition tables are freed at the > > > > > end of each nested query. However, their lifespan needs to be prolonged to > > > > > the end of the out-most query in order to maintain the view in the last AFTER > > > > > trigger. For this purpose, SetTransitionTablePreserved is added in trigger.c. > > > > > > > > > > ** Duplicate Tulpes > > > > > > > > > > When calculating changes that will occur in the view (= delta tables), > > > > > multiplicity of tuples are calculated by using count(*). > > > > > > > > > > When deleting tuples from the view, tuples to be deleted are identified by > > > > > joining the delta table with the view, and tuples are deleted as many as > > > > > specified multiplicity by numbered using row_number() function. > > > > > This is implemented in apply_old_delta(). > > > > > > > > > > When inserting tuples into the view, each tuple is duplicated to the > > > > > specified multiplicity using generate_series() function. This is implemented > > > > > in apply_new_delta(). > > > > > > > > > > ** DISTINCT clause > > > > > > > > > > When DISTINCT is used, the view has a hidden column __ivm_count__ that > > > > > stores multiplicity for tuples. When tuples are deleted from or inserted into > > > > > the view, the values of __ivm_count__ column is decreased or increased as many > > > > > as specified multiplicity. Eventually, when the values becomes zero, the > > > > > corresponding tuple is deleted from the view. This is implemented in > > > > > apply_old_delta_with_count() and apply_new_delta_with_count(). > > > > > > > > > > ** Aggregates > > > > > > > > > > Built-in count sum, avg, min, and max are supported. Whether a given > > > > > aggregate function can be used or not is checked by using its OID in > > > > > check_aggregate_supports_ivm(). > > > > > > > > > > When creating a materialized view containing aggregates, in addition > > > > > to __ivm_count__, more than one hidden columns for each aggregate are > > > > > added to the target list. For example, columns for storing sum(x), > > > > > count(x) are added if we have avg(x). When the view is maintained, > > > > > aggregated values are updated using these hidden columns, also hidden > > > > > columns are updated at the same time. > > > > > > > > > > The maintenance of aggregated view is performed in > > > > > apply_old_delta_with_count() and apply_new_delta_with_count(). The SET > > > > > clauses for updating columns are generated by append_set_clause_*(). > > > > > > > > > > If the view has min(x) or max(x) and the minimum or maximal value is > > > > > deleted from a table, we need to update the value to the new min/max > > > > > recalculated from the tables rather than incremental computation. This > > > > > is performed in recalc_and_set_values(). > > > > > > > > > > --------------------------------------------------------------------------------------- > > > > > * Details of the patch-set (v28) > > > > > > > > > > > The patch-set consists of the following eleven patches. > > > > > > > > > > In the previous version, the number of patches were nine. > > > > > In the latest patch-set, the patches are divided more finely > > > > > aiming to make the review easier. > > > > > > > > > > > - 0001: Add a syntax to create Incrementally Maintainable Materialized Views > > > > > > > > > > The prposed syntax to create an incrementally maintainable materialized > > > > > view (IMMV) is; > > > > > > > > > > CREATE INCREMENTAL MATERIALIZED VIEW AS SELECT .....; > > > > > > > > > > However, this syntax is tentative, so any suggestions are welcomed. > > > > > > > > > > > - 0002: Add relisivm column to pg_class system catalog > > > > > > > > > > We add a new field in pg_class to indicate a relation is IMMV. > > > > > Another alternative is to add a new catalog for managing materialized > > > > > views including IMMV, but I am not sure if we want this. > > > > > > > > > > > - 0003: Allow to prolong life span of transition tables until transaction end > > > > > > > > > > This patch fixes the trigger system to allow to prolong lifespan of > > > > > tuple stores for transition tables until the transaction end. We need > > > > > this because multiple transition tables have to be preserved until the > > > > > end of the out-most query when multiple tables are modified by nested > > > > > triggers. (as explained above in Design - Multiple Tables Modification) > > > > > > > > > > If we don't want to change the trigger system in such way, the alternative > > > > > is to copy the contents of transition tables to other tuplestores, although > > > > > it needs more time and memory. > > > > > > > > > > > - 0004: Add Incremental View Maintenance support to pg_dump > > > > > > > > > > This patch enables pg_dump to output IMMV using the new syntax. > > > > > > > > > > > - 0005: Add Incremental View Maintenance support to psql > > > > > > > > > > This patch implements tab-completion for the new syntax and adds > > > > > information of IMMV to \d meta-command results. > > > > > > > > > > > - 0006: Add Incremental View Maintenance support > > > > > > > > > > This patch implements the basic IVM feature. > > > > > DISTINCT and aggregate are not supported here. > > > > > > > > > > When an IMMV is created, the view query is checked, and if any > > > > > non-supported feature is used, it raises an error. If it is ok, > > > > > triggers are created on base tables and an unique index is > > > > > created on the view if possible. > > > > > > > > > > In BEFORE trigger, an entry is created for each IMMV and the number > > > > > of trigger firing is counted. Also, the snapshot just before the > > > > > table modification is stored. > > > > > > > > > > In AFTER triggers, each transition tables are preserved. The number > > > > > of trigger firing is counted also here, and when the firing number of > > > > > BEFORE and AFTER trigger reach the same, it is deemed the final AFTER > > > > > trigger call. > > > > > > > > > > In the final AFTER trigger, the IMMV is maintained. Rewritten view > > > > > query is executed to generate delta tables, and deltas are applied > > > > > to the view. If multiple tables are modified simultaneously, this > > > > > process is iterated for each modified table. Tables before processed > > > > > are represented in "pre-update-state", processed tables are > > > > > "post-update-state" in the rewritten query. > > > > > > > > > > > - 0007: Add DISTINCT support for IVM > > > > > > > > > > This patch adds DISTINCT clause support. > > > > > > > > > > When an IMMV including DISTINCT is created, a hidden column > > > > > "__ivm_count__" is added to the target list. This column has the > > > > > number of duplicity of the same tuples. The duplicity is calculated > > > > > by adding "count(*)" and GROUP BY to the view query. > > > > > > > > > > When an IMMV is maintained, the duplicity in __ivm_count__ is updated, > > > > > and a tuples whose duplicity becomes zero can be deleted from the view. > > > > > This logic is implemented by SQL in apply_old_delta_with_count and > > > > > apply_new_delta_with_count. > > > > > > > > > > Columns starting with "__ivm_" are deemed hidden columns that doesn't > > > > > appear when a view is accessed by "SELECT * FROM ....". This is > > > > > implemented by fixing parse_relation.c. > > > > > > > > > > > - 0008: Add aggregates support in IVM > > > > > > > > > > This patch provides codes for aggregates support, specifically > > > > > for builtin count, sum, and avg. > > > > > > > > > > When an IMMV containing an aggregate is created, it is checked if this > > > > > aggregate function is supported, and if it is ok, some hidden columns > > > > > are added to the target list. > > > > > > > > > > When the IMMV is maintained, the aggregated value is updated as well as > > > > > related hidden columns. The way of update depends the type of aggregate > > > > > functions, and SET clause string is generated for each aggregate. > > > > > > > > > > > - 0009: Add support for min/max aggregates for IVM > > > > > > > > > > This patch adds min/max aggregates support. > > > > > > > > > > This is separated from #0008 because min/max needs more complicated > > > > > work than count, sum, and avg. > > > > > > > > > > If the view has min(x) or max(x) and the minimum or maximal value is > > > > > deleted from a table, we need to update the value to the new min/max > > > > > recalculated from the tables rather than incremental computation. > > > > > This is performed in recalc_and_set_values(). > > > > > > > > > > TIDs and keys of tuples that need re-calculation are returned as a > > > > > result of the query that deleted min/max values from the view using > > > > > RETURNING clause. The plan to recalculate and set the new min/max value > > > > > are stored and reused. > > > > > > > > > > > - 0010: regression tests > > > > > > > > > > This patch provides regression tests for IVM. > > > > > > > > > > > - 0011: documentation > > > > > > > > > > This patch provides documantation for IVM. > > > > > > > > > > --------------------------------------------------------------------------------------- > > > > > * Changes from the Previous Version (v27) > > > > > > > > > > - Allow TRUNCATE on base tables > > > > > > > > > > When a base table is truncated, the view content will be empty if the > > > > > view definition query does not contain an aggregate without a GROUP clause. > > > > > Therefore, such views can be truncated. > > > > > > > > > > Aggregate views without a GROUP clause always have one row. Therefore, > > > > > if a base table is truncated, the view will not be empty and will contain > > > > > a row with NULL value (or 0 for count()). So, in this case, we refresh the > > > > > view instead of truncating it. > > > > > > > > > > - Fix bugs reported by huyajun [1] > > > > > > > > > > [1] https://www.postgresql.org/message-id/tencent_FCAF11BCA5003FD16BDDFDDA5D6A19587809%40qq.com > > > > > > > > > > --------------------------------------------------------------------------------------- > > > > > * Discussion > > > > > > > > > > ** Aggregate support > > > > > > > > > > There were a few suggestions that general aggregate functions should be > > > > > supported [2][3], which may be possible by extending pg_aggregate catalog. > > > > > However, we decided to leave supporting general aggregates to the future work [4] > > > > > because it would need substantial works and make the patch more complex and > > > > > bigger. > > > > > > > > > > There has been no opposite opinion on this. However, if we need more discussion > > > > > on the design of aggregate support, we can omit aggregate support for the first > > > > > release of IVM. > > > > > > > > > > [2] https://www.postgresql.org/message-id/20191128140333.GA25947%40alvherre.pgsql > > > > > [3] https://www.postgresql.org/message-id/CAM-w4HOvDrL4ou6m%3D592zUiKGVzTcOpNj-d_cJqzL00fdsS5kg%40mail.gmail.com > > > > > [4] https://www.postgresql.org/message-id/20201016193034.9a4c44c79fc1eca7babe093e%40sraoss.co.jp > > > > > > > > > > ** Hidden columns > > > > > > > > > > In order to support DISTINCT or aggregates, our implementation uses hidden columns. > > > > > > > > > > Columns starting with "__ivm_" are hidden columns that doesn't appear when a > > > > > view is accessed by "SELECT * FROM ....". For this aim, parse_relation.c is > > > > > fixed. There was a proposal to enable hidden columns by adding a new flag to > > > > > pg_attribute [5], but this thread is no longer active, so we decided to check > > > > > the hidden column by its name [6]. > > > > > > > > > > [5] https://www.postgresql.org/message-id/flat/CAEepm%3D3ZHh%3Dp0nEEnVbs1Dig_UShPzHUcMNAqvDQUgYgcDo-pA%40mail.gmail.com > > > > > [6] https://www.postgresql.org/message-id/20201016193034.9a4c44c79fc1eca7babe093e%40sraoss.co.jp > > > > > > > > > > ** Concurrent Transactions > > > > > > > > > > When the view definition has more than one table, we acquire an exclusive > > > > > lock before the view maintenance in order to avoid inconsistent results. > > > > > This behavior was explained in [7]. The lock was improved to use weaker lock > > > > > when the view has only one table based on a suggestion from Konstantin Knizhnik [8]. > > > > > However, due to the implementation that uses ctid for identifying target tuples, > > > > > we still have to use an exclusive lock for DELETE and UPDATE. > > > > > > > > > > [7] https://www.postgresql.org/message-id/20200909092752.c91758a1bec3479668e82643%40sraoss.co.jp > > > > > [8] https://www.postgresql.org/message-id/5663f5f0-48af-686c-bf3c-62d279567e2a%40postgrespro.ru > > > > > > > > > > ** Automatic Index Creation > > > > > > > > > > When a view is created, a unique index is automatically created if > > > > > possible, that is, if the view definition query has a GROUP BY or > > > > > DISTINCT, or if the view contains all primary key attributes of > > > > > its base tables in the target list. It is necessary for efficient > > > > > view maintenance. This feature is based on a suggestion from > > > > > Konstantin Knizhnik [9]. > > > > > > > > > > [9] https://www.postgresql.org/message-id/89729da8-9042-7ea0-95af-e415df6da14d%40postgrespro.ru > > > > > > > > > > > > > > > ** Trigger and Transition Tables > > > > > > > > > > We implemented IVM based on triggers. This is because we want to use > > > > > transition tables to extract changes on base tables. Also, there are > > > > > other constraint that are using triggers in its implementation, like > > > > > foreign references. However, if we can use transition table like feature > > > > > without relying triggers, we don't have to insist to use triggers and we > > > > > might implement IVM in the executor directly as similar as declarative > > > > > partitioning. > > > > > > > > > > ** Feature to be Supported in the First Release > > > > > > > > > > The current patch-set supports DISTINCT and aggregates for built-in count, > > > > > sum, avg, min and max. Do we need all these feature for the first IVM release? > > > > > Supporting DISTINCT and aggregates needs discussion on hidden columns, and > > > > > for supporting min/max we need to discuss on re-calculation method. Before > > > > > handling such relatively advanced feature, maybe, should we focus to design > > > > > and implement of the basic feature of IVM? > > > > > > > > > > > > > > > Any suggestion and discussion are welcomed! > > > > > > > > > > Regards, > > > > > Yugo Nagata > > > > > > > > > > -- > > > > > Yugo NAGATA <nagata@sraoss.co.jp> > > > > > > > > > > > > > > > > > > > > > > > The followings are supported in view definition queries: > > > > > - SELECT ... FROM ... WHERE ..., joins (inner joins, self-joins) > > > > > > > > > > > > > Also, a view definition query cannot contain other views, materialized views, > > > > > foreign tables, partitioned tables, partitions, VALUES, non-immutable functions, > > > > > system columns, or expressions that contains aggregates. > > > > > > > > Does this also apply to tableoid? but tableoid is a constant, so it > > > > should be fine? > > > > can following two queries apply to this feature. > > > > select tableoid, unique1 from tenk1; > > > > > > Currently, this is not allowed because tableoid is a system column. > > > As you say, tableoid is a constant, so we can allow. Should we do this? > > > > > > > select 1 as constant, unique1 from tenk1; > > > > > > This is allowed, of course. > > > > > > > I didn't apply the patch.(will do later, for someone to test, it would > > > > be a better idea to dump a whole file separately....). > > > > > > Thank you! I'm looking forward to your feedback. > > > (I didn't attach a whole patch separately because I wouldn't like > > > cfbot to be unhappy...) > > > > > > Regards, > > > Yugo Nagata > > > > > > -- > > > Yugo NAGATA <nagata@sraoss.co.jp> > > > > I played around first half of regress patch. > > these all following queries fails. > > > > CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_rename AS > > SELECT DISTINCT * , 1 as "__ivm_count__" FROM mv_base_a; > > > > CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_rename AS > > SELECT DISTINCT * , 1 as "__ivm_countblablabla" FROM mv_base_a; > > > > CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_rename AS > > SELECT DISTINCT * , 1 as "__ivm_count" FROM mv_base_a; > > > > CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_rename AS > > SELECT DISTINCT * , 1 as "__ivm_count_____" FROM mv_base_a; > > > > CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_rename AS > > SELECT DISTINCT * , 1 as "__ivm_countblabla" FROM mv_base_a; > > > > so the hidden column reserved pattern "__ivm_count.*"? that would be a lot.... > > > > select * from pg_matviews where matviewname = 'mv_ivm_1'; > > don't have relisivm option. it's reasonable to make it in view pg_matviews? > > another trivial: > incremental_matview.out (last few lines) last transaction seems to > need COMMIT command. Thank you for pointing out it. There is a unnecessary BEGIN, so I'll remove it. Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
On Thu, 29 Jun 2023 18:51:06 +0800 jian he <jian.universality@gmail.com> wrote: > I cannot build the doc. > git clean -fdx > git am ~/Desktop/tmp/*.patch > > Applying: Add a syntax to create Incrementally Maintainable Materialized Views > Applying: Add relisivm column to pg_class system catalog > Applying: Allow to prolong life span of transition tables until transaction end > Applying: Add Incremental View Maintenance support to pg_dump > Applying: Add Incremental View Maintenance support to psql > Applying: Add Incremental View Maintenance support > Applying: Add DISTINCT support for IVM > Applying: Add aggregates support in IVM > Applying: Add support for min/max aggregates for IVM > Applying: Add regression tests for Incremental View Maintenance > Applying: Add documentations about Incremental View Maintenance > .git/rebase-apply/patch:79: trailing whitespace. > clause. > warning: 1 line adds whitespace errors. > > Because of this, the {ninja docs} command failed. ERROR message: > > [6/6] Generating doc/src/sgml/html with a custom command > FAILED: doc/src/sgml/html > /usr/bin/python3 > ../../Desktop/pg_sources/main/postgres/doc/src/sgml/xmltools_dep_wrapper > --targetname doc/src/sgml/html --depfile doc/src/sgml/html.d --tool > /usr/bin/xsltproc -- -o doc/src/sgml/ --nonet --stringparam pg.version > 16beta2 --path doc/src/sgml --path > ../../Desktop/pg_sources/main/postgres/doc/src/sgml > ../../Desktop/pg_sources/main/postgres/doc/src/sgml/stylesheet.xsl > doc/src/sgml/postgres-full.xml > ERROR: id attribute missing on <sect2> element under /book[@id = > 'postgres']/part[@id = 'server-programming']/chapter[@id = > 'rules']/sect1[@id = 'rules-ivm'] > error: file doc/src/sgml/postgres-full.xml > xsltRunStylesheet : run failed > ninja: build stopped: subcommand failed. Thank your for pointing out this. I'll add ids for all sections to suppress the errors. Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
On Fri, 30 Jun 2023 08:00:00 +0800 jian he <jian.universality@gmail.com> wrote: > Hi there. > in v28-0005-Add-Incremental-View-Maintenance-support-to-psql.patch > I don't know how to set psql to get the output > "Incremental view maintenance: yes" This information will appear when you use "d+" command for an incrementally maintained materialized view. Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
On Sun, 2 Jul 2023 08:25:12 +0800 jian he <jian.universality@gmail.com> wrote: > This is probably not trivial. > In function apply_new_delta_with_count. > > appendStringInfo(&querybuf, > "WITH updt AS (" /* update a tuple if this exists in the view */ > "UPDATE %s AS mv SET %s = mv.%s OPERATOR(pg_catalog.+) diff.%s " > "%s " /* SET clauses for aggregates */ > "FROM %s AS diff " > "WHERE %s " /* tuple matching condition */ > "RETURNING %s" /* returning keys of updated tuples */ > ") INSERT INTO %s (%s)" /* insert a new tuple if this doesn't existw */ > "SELECT %s FROM %s AS diff " > "WHERE NOT EXISTS (SELECT 1 FROM updt AS mv WHERE %s);", > > --------------------- > ") INSERT INTO %s (%s)" /* insert a new tuple if this doesn't existw */ > "SELECT %s FROM %s AS diff " > > the INSERT INTO line, should have one white space in the end? > also "existw" should be "exists" Yes, we should need a space although it works. I'll fix as well as the typo. Thank you. Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
On Sun, 2 Jul 2023 10:38:20 +0800 jian he <jian.universality@gmail.com> wrote: > ok. Now I really found a small bug. > > this works as intended: > BEGIN; > CREATE INCREMENTAL MATERIALIZED VIEW test_ivm AS SELECT i, MIN(j) as > min_j FROM mv_base_a group by 1; > INSERT INTO mv_base_a select 1,-2 where false; > rollback; > > however the following one: > BEGIN; > CREATE INCREMENTAL MATERIALIZED VIEW test_ivm1 AS SELECT MIN(j) as > min_j FROM mv_base_a; > INSERT INTO mv_base_a select 1, -2 where false; > rollback; > > will evaluate > tuplestore_tuple_count(new_tuplestores) to 1, it will walk through > IVM_immediate_maintenance function to apply_delta. > but should it be zero? This is not a bug because an aggregate without GROUP BY always results one row whose value is NULL. The contents of test_imv1 would be always same as " SELECT MIN(j) as min_j FROM mv_base_a;", isn't it? Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
On Mon, 28 Aug 2023 02:49:08 +0900 Yugo NAGATA <nagata@sraoss.co.jp> wrote: > On Sun, 2 Jul 2023 10:38:20 +0800 > jian he <jian.universality@gmail.com> wrote: I attahed the patches v29 updated to comments from jian he. The changes from the previous includes: - errors in documentations is fixed. - remove unnecessary BEGIN from the test - add isimmv column to pg_matviews system view - fix a typo - rebase to the master branch > > > ok. Now I really found a small bug. > > > > this works as intended: > > BEGIN; > > CREATE INCREMENTAL MATERIALIZED VIEW test_ivm AS SELECT i, MIN(j) as > > min_j FROM mv_base_a group by 1; > > INSERT INTO mv_base_a select 1,-2 where false; > > rollback; > > > > however the following one: > > BEGIN; > > CREATE INCREMENTAL MATERIALIZED VIEW test_ivm1 AS SELECT MIN(j) as > > min_j FROM mv_base_a; > > INSERT INTO mv_base_a select 1, -2 where false; > > rollback; > > > > will evaluate > > tuplestore_tuple_count(new_tuplestores) to 1, it will walk through > > IVM_immediate_maintenance function to apply_delta. > > but should it be zero? > > This is not a bug because an aggregate without GROUP BY always > results one row whose value is NULL. > > The contents of test_imv1 would be always same as " SELECT MIN(j) as min_j > FROM mv_base_a;", isn't it? > > > Regards, > Yugo Nagata > > -- > Yugo NAGATA <nagata@sraoss.co.jp> > > -- Yugo NAGATA <nagata@sraoss.co.jp>
Attachment
- v29-0001-Add-a-syntax-to-create-Incrementally-Maintainabl.patch
- v29-0002-Add-relisivm-column-to-pg_class-system-catalog.patch
- v29-0003-Allow-to-prolong-life-span-of-transition-tables-.patch
- v29-0004-Add-Incremental-View-Maintenance-support-to-pg_d.patch
- v29-0005-Add-Incremental-View-Maintenance-support-to-psql.patch
- v29-0006-Add-Incremental-View-Maintenance-support.patch
- v29-0007-Add-DISTINCT-support-for-IVM.patch
- v29-0008-Add-aggregates-support-in-IVM.patch
- v29-0009-Add-support-for-min-max-aggregates-for-IVM.patch
- v29-0010-Add-regression-tests-for-Incremental-View-Mainte.patch
- v29-0011-Add-documentations-about-Incremental-View-Mainte.patch
On Mon, 28 Aug 2023 11:52:52 +0900 Yugo NAGATA <nagata@sraoss.co.jp> wrote: > On Mon, 28 Aug 2023 02:49:08 +0900 > Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > > On Sun, 2 Jul 2023 10:38:20 +0800 > > jian he <jian.universality@gmail.com> wrote: > > I attahed the patches v29 updated to comments from jian he. > The changes from the previous includes: > > - errors in documentations is fixed. > - remove unnecessary BEGIN from the test > - add isimmv column to pg_matviews system view > - fix a typo > - rebase to the master branch I found pg_dump test was broken, so attached the fixed version. Regards, Yugo Nagata > > > > > > ok. Now I really found a small bug. > > > > > > this works as intended: > > > BEGIN; > > > CREATE INCREMENTAL MATERIALIZED VIEW test_ivm AS SELECT i, MIN(j) as > > > min_j FROM mv_base_a group by 1; > > > INSERT INTO mv_base_a select 1,-2 where false; > > > rollback; > > > > > > however the following one: > > > BEGIN; > > > CREATE INCREMENTAL MATERIALIZED VIEW test_ivm1 AS SELECT MIN(j) as > > > min_j FROM mv_base_a; > > > INSERT INTO mv_base_a select 1, -2 where false; > > > rollback; > > > > > > will evaluate > > > tuplestore_tuple_count(new_tuplestores) to 1, it will walk through > > > IVM_immediate_maintenance function to apply_delta. > > > but should it be zero? > > > > This is not a bug because an aggregate without GROUP BY always > > results one row whose value is NULL. > > > > The contents of test_imv1 would be always same as " SELECT MIN(j) as min_j > > FROM mv_base_a;", isn't it? > > > > > > Regards, > > Yugo Nagata > > > > -- > > Yugo NAGATA <nagata@sraoss.co.jp> > > > > > > > -- > Yugo NAGATA <nagata@sraoss.co.jp> -- Yugo NAGATA <nagata@sraoss.co.jp>
Attachment
- v29-0001-Add-a-syntax-to-create-Incrementally-Maintainabl.patch
- v29-0002-Add-relisivm-column-to-pg_class-system-catalog.patch
- v29-0003-Allow-to-prolong-life-span-of-transition-tables-.patch
- v29-0004-Add-Incremental-View-Maintenance-support-to-pg_d.patch
- v29-0005-Add-Incremental-View-Maintenance-support-to-psql.patch
- v29-0006-Add-Incremental-View-Maintenance-support.patch
- v29-0007-Add-DISTINCT-support-for-IVM.patch
- v29-0008-Add-aggregates-support-in-IVM.patch
- v29-0009-Add-support-for-min-max-aggregates-for-IVM.patch
- v29-0010-Add-regression-tests-for-Incremental-View-Mainte.patch
- v29-0011-Add-documentations-about-Incremental-View-Mainte.patch
hi based on v29. based on https://stackoverflow.com/a/4014981/1560347: I added a new function append_update_set_caluse, and deleted functions: {append_set_clause_for_count, append_set_clause_for_sum, append_set_clause_for_avg, append_set_clause_for_minmax} I guess this way is more extensible/generic than yours. replaced the following code with the generic function: append_update_set_caluse. + /* For views with aggregates, we need to build SET clause for updating aggregate + * values. */ + if (query->hasAggs && IsA(tle->expr, Aggref)) + { + Aggref *aggref = (Aggref *) tle->expr; + const char *aggname = get_func_name(aggref->aggfnoid); + + /* + * We can use function names here because it is already checked if these + * can be used in IMMV by its OID at the definition time. + */ + + /* count */ + if (!strcmp(aggname, "count")) + append_set_clause_for_count(resname, aggs_set_old, aggs_set_new, aggs_list_buf); + + /* sum */ + else if (!strcmp(aggname, "sum")) + append_set_clause_for_sum(resname, aggs_set_old, aggs_set_new, aggs_list_buf); + + /* avg */ + else if (!strcmp(aggname, "avg")) + append_set_clause_for_avg(resname, aggs_set_old, aggs_set_new, aggs_list_buf, + format_type_be(aggref->aggtype)); + + else + elog(ERROR, "unsupported aggregate function: %s", aggname); + } ----------------------<<< attached is my refactor. there is some whitespace errors in the patches, you need use git apply --reject --whitespace=fix basedon_v29_matview_c_refactor_update_set_clause.patch Also you patch cannot use git apply, i finally found out bulk apply using gnu patch from https://serverfault.com/questions/102324/apply-multiple-patch-files. previously I just did it manually one by one. I think if you use { for i in $PATCHES/v29*.patch; do patch -p1 < $i; done } GNU patch, it will generate an .orig file for every modified file? -----------------<<<<< src/backend/commands/matview.c 2268: /* For tuple deletion */ maybe "/* For tuple deletion and update*/" is more accurate? -----------------<<<<< currently at here: src/test/regress/sql/incremental_matview.sql 98: -- support SUM(), COUNT() and AVG() aggregate functions 99: BEGIN; 100: CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_agg AS SELECT i, SUM(j), COUNT(i), AVG(j) FROM mv_base_a GROUP BY i; 101: SELECT * FROM mv_ivm_agg ORDER BY 1,2,3,4; 102: INSERT INTO mv_base_a VALUES(2,100); src/backend/commands/matview.c 2858: if (SPI_exec(querybuf.data, 0) != SPI_OK_INSERT) 2859: elog(ERROR, "SPI_exec failed: %s", querybuf.data); then I debug, print out querybuf.data: WITH updt AS (UPDATE public.mv_ivm_agg AS mv SET __ivm_count__ = mv.__ivm_count__ OPERATOR(pg_catalog.+) diff.__ivm_count__ , sum = (CASE WHEN mv.__ivm_count_sum__ OPERATOR(pg_catalog.=) 0 AND diff.__ivm_count_sum__ OPERATOR(pg_catalog.=) 0 THEN NULL WHEN mv.sum IS NULL THEN diff.sum WHEN diff.sum IS NULL THEN mv.sum ELSE (mv.sum OPERATOR(pg_catalog.+) diff.sum) END), __ivm_count_sum__ = (mv.__ivm_count_sum__ OPERATOR(pg_catalog.+) diff.__ivm_count_sum__), count = (mv.count OPERATOR(pg_catalog.+) diff.count), avg = (CASE WHEN mv.__ivm_count_avg__ OPERATOR(pg_catalog.=) 0 AND diff.__ivm_count_avg__ OPERATOR(pg_catalog.=) 0 THEN NULL WHEN mv.__ivm_sum_avg__ IS NULL THEN diff.__ivm_sum_avg__ WHEN diff.__ivm_sum_avg__ IS NULL THEN mv.__ivm_sum_avg__ ELSE (mv.__ivm_sum_avg__ OPERATOR(pg_catalog.+) diff.__ivm_sum_avg__)::numeric END) OPERATOR(pg_catalog./) (mv.__ivm_count_avg__ OPERATOR(pg_catalog.+) diff.__ivm_count_avg__), __ivm_sum_avg__ = (CASE WHEN mv.__ivm_count_avg__ OPERATOR(pg_catalog.=) 0 AND diff.__ivm_count_avg__ OPERATOR(pg_catalog.=) 0 THEN NULL WHEN mv.__ivm_sum_avg__ IS NULL THEN diff.__ivm_sum_avg__ WHEN diff.__ivm_sum_avg__ IS NULL THEN mv.__ivm_sum_avg__ ELSE (mv.__ivm_sum_avg__ OPERATOR(pg_catalog.+) diff.__ivm_sum_avg__) END), __ivm_count_avg__ = (mv.__ivm_count_avg__ OPERATOR(pg_catalog.+) diff.__ivm_count_avg__) FROM new_delta AS diff WHERE (mv.i OPERATOR(pg_catalog.=) diff.i OR (mv.i IS NULL AND diff.i IS NULL)) RETURNING mv.i) INSERT INTO public.mv_ivm_agg (i, sum, count, avg, __ivm_count_sum__, __ivm_count_avg__, __ivm_sum_avg__, __ivm_count__) SELECT i, sum, count, avg, __ivm_count_sum__, __ivm_count_avg__, __ivm_sum_avg__, __ivm_count__ FROM new_delta AS diff WHERE NOT EXISTS (SELECT 1 FROM updt AS mv WHERE (mv.i OPERATOR(pg_catalog.=) diff.i OR (mv.i IS NULL AND diff.i IS NULL))); At this final SPI_exec, we have a update statement with related columns { __ivm_count_sum__, sum, __ivm_count__, count, avg, __ivm_sum_avg__, __ivm_count_avg__}. At this time, my mind stops working, querybuf.data is way too big, but I still feel like there is some logic associated with these columns, maybe we can use it as an assertion to prove that this query (querybuf.len = 1834) is indeed correct. Since the apply delta query is quite complex, I feel like adding some "if debug then print out the final querybuf.data end if" would be a good idea. we add hidden columns somewhere, also to avoid corner cases, so maybe somewhere we should assert total attribute number is sane.
Attachment
> attached is my refactor. there is some whitespace errors in the > patches, you need use > git apply --reject --whitespace=fix > basedon_v29_matview_c_refactor_update_set_clause.patch > > Also you patch cannot use git apply, i finally found out bulk apply I have no problem with applying Yugo's v29 patches using git apply, no white space errors. $ git apply ~/v29* (the patches are saved under my home directory). I suggest you to check your email application whether it correctly saved the patch files for you. FYI, here are results from sha256sum: ffac37cb455788c1105ffc01c6b606de75f53321c2f235f7efa19f3f52d12b9e v29-0001-Add-a-syntax-to-create-Incrementally-Maintainabl.patch f684485e7c9ac1b2990943a3c73fa49a9091a268917547d9e116baef5118cca7 v29-0002-Add-relisivm-column-to-pg_class-system-catalog.patch fcf5bc8ae562ed1c2ab397b499544ddab03ad2c3acb2263d0195a3ec799b131c v29-0003-Allow-to-prolong-life-span-of-transition-tables-.patch a7a13ef8e73c4717166db079d5607f78d21199379de341a0e8175beef5ea1c1a v29-0004-Add-Incremental-View-Maintenance-support-to-pg_d.patch a2aa51d035774867bfab1580ef14143998dc71c1b941bd1a3721dc019bc62649 v29-0005-Add-Incremental-View-Maintenance-support-to-psql.patch fe0225d761a08eb80082f1a2c039b9b8b20626169b03abaf649db9c74fe99194 v29-0006-Add-Incremental-View-Maintenance-support.patch 68b007befedcf92fc83ab8c3347ac047a50816f061c77b69281e12d52944db82 v29-0007-Add-DISTINCT-support-for-IVM.patch 2201241a22095f736a17383fc8b26d48a459ebf1c2f5cf120896cfc0ce5e03e4 v29-0008-Add-aggregates-support-in-IVM.patch 6390117c559bf1585349c5a09b77b784e086ccc22eb530cd364ce78371c66741 v29-0009-Add-support-for-min-max-aggregates-for-IVM.patch 7019a116c64127783bd9c682ddf1ee3792286d0e41c91a33010111e7be2c9459 v29-0010-Add-regression-tests-for-Incremental-View-Mainte.patch 189afdc7da866bd958e2d554ba12adf93d7e6d0acb581290a48d72fcf640e243 v29-0011-Add-documentations-about-Incremental-View-Mainte.patch Best reagards, -- Tatsuo Ishii SRA OSS LLC English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
On Sat, Sep 2, 2023 at 7:46 PM Tatsuo Ishii <ishii@sraoss.co.jp> wrote: > > > attached is my refactor. there is some whitespace errors in the > > patches, you need use > > git apply --reject --whitespace=fix > > basedon_v29_matview_c_refactor_update_set_clause.patch > > > > Also you patch cannot use git apply, i finally found out bulk apply > > I have no problem with applying Yugo's v29 patches using git apply, no > white space errors. > thanks. I downloaded the patches from the postgres website, then the problem was solved. other ideas based on v29. src/include/utils/rel.h 680: #define RelationIsIVM(relation) ((relation)->rd_rel->relisivm) I guess it would be better to add some comments to address the usage. Since all peer macros all have some comments. pg_class change, I guess we need bump CATALOG_VERSION_NO? small issue. makeIvmAggColumn and calc_delta need to add an empty return statement? style issue. in gram.y, "incremental" upper case? + CREATE OptNoLog incremental MATERIALIZED VIEW create_mv_target AS SelectStmt opt_with_data I don't know how pgident works, do you need to add some keywords to src/tools/pgindent/typedefs.list to make indentation work? in /* If this is not the last AFTER trigger call, immediately exit. */ Assert (entry->before_trig_count >= entry->after_trig_count); if (entry->before_trig_count != entry->after_trig_count) return PointerGetDatum(NULL); before returning NULL, do you also need clean_up_IVM_hash_entry? (I don't know when this case will happen) in /* Replace the modified table with the new delta table and calculate the new view delta*/ replace_rte_with_delta(rte, table, true, queryEnv); refresh_matview_datafill(dest_new, query, queryEnv, tupdesc_new, ""); replace_rte_with_delta does not change the argument: table, argument: queryEnv. refresh_matview_datafill just uses the partial argument of the function calc_delta. So I guess, I am confused by the usage of replace_rte_with_delta. also I think it should return void, since you just modify the input argument. Here refresh_matview_datafill is just persisting new delta content to dest_new?
2024-01 Commitfest. Hi, This patch has a CF status of "Needs Review" [1], but it seems like there was some CFbot test failure last time it was run [2]. Please have a look and post an updated version if necessary. ====== [1] https://commitfest.postgresql.org/46/4337/ [2] https://cirrus-ci.com/task/6607979311529984 Kind Regards, Peter Smith.
On Mon, 22 Jan 2024 13:51:08 +1100 Peter Smith <smithpb2250@gmail.com> wrote: > 2024-01 Commitfest. > > Hi, This patch has a CF status of "Needs Review" [1], but it seems > like there was some CFbot test failure last time it was run [2]. > Please have a look and post an updated version if necessary. Thank you for pointing out it. The CFbot failure is caused by a post [1] not by my patch-set, but regardless of it, I will heck if we need rebase and send the new version if necessary soon. [1] https://www.postgresql.org/message-id/CACJufxEoCCJE1vntJp1SWjen8vBUa3vZLgL%3DswPwar4zim976g%40mail.gmail.com Regards, Yugo Nagata > ====== > [1] https://commitfest.postgresql.org/46/4337/ > [2] https://cirrus-ci.com/task/6607979311529984 > > Kind Regards, > Peter Smith. -- Yugo NAGATA <nagata@sraoss.co.jp>
On Fri, 1 Sep 2023 15:42:17 +0800 jian he <jian.universality@gmail.com> wrote: I apologize for this late reply. > I added a new function append_update_set_caluse, and deleted > functions: {append_set_clause_for_count, append_set_clause_for_sum, > append_set_clause_for_avg, append_set_clause_for_minmax} > > I guess this way is more extensible/generic than yours. Do you mean that consolidating such functions to a general function make easier to support a new aggregate function in future? I'm not convinced completely yet it because your suggestion seems that every functions' logic are just put into a new function, but providing a common interface might make a sense a bit. By the way, when you attach files other than updated patches that can be applied to master branch, using ".patch" or ".diff" as the file extension help to avoid to confuse cfbot (for example, like basedon_v29_matview_c_refactor_update_set_clause.patch.txt). > src/backend/commands/matview.c > 2268: /* For tuple deletion */ > maybe "/* For tuple deletion and update*/" is more accurate? This "deletion" means deletion of tuple from the view rather than DELETE statement, so I think this is ok. > Since the apply delta query is quite complex, I feel like adding some > "if debug then print out the final querybuf.data end if" would be a > good idea. Agreed, it would be helpful for debugging. I think it would be good to add a debug macro that works if DEBUG_IVM is defined rather than adding GUC like debug_print_..., how about it? > we add hidden columns somewhere, also to avoid corner cases, so maybe > somewhere we should assert total attribute number is sane. The number of hidden columns to be added depends on the view definition query, so I wonder the Assert condition would be a bit complex. Could you explain what are you assume about like for example? Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
On Mon, 4 Sep 2023 16:48:02 +0800 jian he <jian.universality@gmail.com> wrote: > other ideas based on v29. > > src/include/utils/rel.h > 680: #define RelationIsIVM(relation) ((relation)->rd_rel->relisivm) > I guess it would be better to add some comments to address the usage. > Since all peer macros all have some comments. OK. I will add comments on this macro. > pg_class change, I guess we need bump CATALOG_VERSION_NO? CATALOG_VERSION_NO is frequently bumped up when new features are committed, so including it in the patch causes frequent needs for rebase during the review of the patch even if no meaningful change is made. Therefore, I wonder we don't have to included it in the patch at this time. > small issue. makeIvmAggColumn and calc_delta need to add an empty > return statement? I'm sorry but I could not understand what you suggested, so could you give me more explanation? > style issue. in gram.y, "incremental" upper case? > + CREATE OptNoLog incremental MATERIALIZED VIEW > create_mv_target AS SelectStmt opt_with_data This "incremental" is defined as INCREMENTAL or empty, as below. incremental: INCREMENTAL { $$ = true; } | /*EMPTY*/ { $$ = false; } > I don't know how pgident works, do you need to add some keywords to > src/tools/pgindent/typedefs.list to make indentation work? I'm not sure typedefs.list should be updated in each patch, because tools/pgindent/README said that the latest typedef file is downloaded from the buildfarm when pgindent is run. > in > /* If this is not the last AFTER trigger call, immediately exit. */ > Assert (entry->before_trig_count >= entry->after_trig_count); > if (entry->before_trig_count != entry->after_trig_count) > return PointerGetDatum(NULL); > > before returning NULL, do you also need clean_up_IVM_hash_entry? (I > don't know when this case will happen) No, clean_up_IVM_hash_entry is not necessary in this case. When multiple tables are updated in a statement, statement-level AFTER triggers collects every information of the tables, and the last AFTER trigger have to perform the actual maintenance of the view. To make sure this, the number that BEFORE and AFTER trigger is fired is counted respectively, and when they match it is regarded the last AFTER trigger call performing the maintenance. Until this, collected information have to keep, so we cannot call clean_up_IVM_hash_entry. > in > /* Replace the modified table with the new delta table and > calculate the new view delta*/ > replace_rte_with_delta(rte, table, true, queryEnv); > refresh_matview_datafill(dest_new, query, queryEnv, tupdesc_new, ""); > > replace_rte_with_delta does not change the argument: table, argument: > queryEnv. refresh_matview_datafill just uses the partial argument of > the function calc_delta. So I guess, I am confused by the usage of > replace_rte_with_delta. also I think it should return void, since you > just modify the input argument. Here refresh_matview_datafill is just > persisting new delta content to dest_new? Yes, refresh_matview_datafill executes the query and the result rows to "dest_new". And, replace_rte_with_delta updates the input argument "rte" and returns the result to it, so it may be better that this returns void, as you suggested. Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
On Tue, 23 Jan 2024 16:23:27 +0900 Yugo NAGATA <nagata@sraoss.co.jp> wrote: > On Mon, 22 Jan 2024 13:51:08 +1100 > Peter Smith <smithpb2250@gmail.com> wrote: > > > 2024-01 Commitfest. > > > > Hi, This patch has a CF status of "Needs Review" [1], but it seems > > like there was some CFbot test failure last time it was run [2]. > > Please have a look and post an updated version if necessary. I attached a rebased patch-set, v30. Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
Attachment
- v30-0001-Add-a-syntax-to-create-Incrementally-Maintainabl.patch
- v30-0002-Add-relisivm-column-to-pg_class-system-catalog.patch
- v30-0003-Allow-to-prolong-life-span-of-transition-tables-.patch
- v30-0004-Add-Incremental-View-Maintenance-support-to-pg_d.patch
- v30-0005-Add-Incremental-View-Maintenance-support-to-psql.patch
- v30-0006-Add-Incremental-View-Maintenance-support.patch
- v30-0007-Add-DISTINCT-support-for-IVM.patch
- v30-0008-Add-aggregates-support-in-IVM.patch
- v30-0009-Add-support-for-min-max-aggregates-for-IVM.patch
- v30-0010-Add-regression-tests-for-Incremental-View-Mainte.patch
- v30-0011-Add-documentations-about-Incremental-View-Mainte.patch
On Mon, 4 Mar 2024 11:58:46 +0900 Yugo NAGATA <nagata@sraoss.co.jp> wrote: > On Tue, 23 Jan 2024 16:23:27 +0900 > Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > > On Mon, 22 Jan 2024 13:51:08 +1100 > > Peter Smith <smithpb2250@gmail.com> wrote: > > > > > 2024-01 Commitfest. > > > > > > Hi, This patch has a CF status of "Needs Review" [1], but it seems > > > like there was some CFbot test failure last time it was run [2]. > > > Please have a look and post an updated version if necessary. > > I attached a rebased patch-set, v30. I attached a rebased patch-set, v31. Also, I added a comment on RelationIsIVM() macro persuggestion from jian he. In addition, I fixed a failure reported from cfbot on FreeBSD build caused by; WARNING: outfuncs/readfuncs failed to produce an equal rewritten parse tree This warning was raised since I missed to modify outfuncs.c for a new field. Regards, Yugo Nagata > > Regards, > Yugo Nagata > > -- > Yugo NAGATA <nagata@sraoss.co.jp> -- Yugo NAGATA <nagata@sraoss.co.jp>
Attachment
- v31-0001-Add-a-syntax-to-create-Incrementally-Maintainabl.patch
- v31-0002-Add-relisivm-column-to-pg_class-system-catalog.patch
- v31-0003-Allow-to-prolong-life-span-of-transition-tables-.patch
- v31-0004-Add-Incremental-View-Maintenance-support-to-pg_d.patch
- v31-0005-Add-Incremental-View-Maintenance-support-to-psql.patch
- v31-0006-Add-Incremental-View-Maintenance-support.patch
- v31-0007-Add-DISTINCT-support-for-IVM.patch
- v31-0008-Add-aggregates-support-in-IVM.patch
- v31-0009-Add-support-for-min-max-aggregates-for-IVM.patch
- v31-0010-Add-regression-tests-for-Incremental-View-Mainte.patch
- v31-0011-Add-documentations-about-Incremental-View-Mainte.patch
On Fri, 29 Mar 2024 23:47:00 +0900 Yugo NAGATA <nagata@sraoss.co.jp> wrote: > On Mon, 4 Mar 2024 11:58:46 +0900 > Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > > On Tue, 23 Jan 2024 16:23:27 +0900 > > Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > > > > On Mon, 22 Jan 2024 13:51:08 +1100 > > > Peter Smith <smithpb2250@gmail.com> wrote: > > > > > > > 2024-01 Commitfest. > > > > > > > > Hi, This patch has a CF status of "Needs Review" [1], but it seems > > > > like there was some CFbot test failure last time it was run [2]. > > > > Please have a look and post an updated version if necessary. > > > > I attached a rebased patch-set, v30. > > I attached a rebased patch-set, v31. > > Also, I added a comment on RelationIsIVM() macro persuggestion from jian he. > In addition, I fixed a failure reported from cfbot on FreeBSD build caused by; > > WARNING: outfuncs/readfuncs failed to produce an equal rewritten parse tree > > This warning was raised since I missed to modify outfuncs.c for a new field. I found cfbot on FreeBSD still reported a failure due to ENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS because the regression test used wrong role names. Attached is a fixed version, v32. Regards, Yugo Nagata > > Regards, > Yugo Nagata > > > > > Regards, > > Yugo Nagata > > > > -- > > Yugo NAGATA <nagata@sraoss.co.jp> > > > -- > Yugo NAGATA <nagata@sraoss.co.jp> -- Yugo NAGATA <nagata@sraoss.co.jp>
Attachment
- v32-0001-Add-a-syntax-to-create-Incrementally-Maintainabl.patch
- v32-0002-Add-relisivm-column-to-pg_class-system-catalog.patch
- v32-0003-Allow-to-prolong-life-span-of-transition-tables-.patch
- v32-0004-Add-Incremental-View-Maintenance-support-to-pg_d.patch
- v32-0005-Add-Incremental-View-Maintenance-support-to-psql.patch
- v32-0006-Add-Incremental-View-Maintenance-support.patch
- v32-0007-Add-DISTINCT-support-for-IVM.patch
- v32-0008-Add-aggregates-support-in-IVM.patch
- v32-0009-Add-support-for-min-max-aggregates-for-IVM.patch
- v32-0010-Add-regression-tests-for-Incremental-View-Mainte.patch
- v32-0011-Add-documentations-about-Incremental-View-Mainte.patch
On Sun, 31 Mar 2024 22:59:31 +0900 Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > > > Also, I added a comment on RelationIsIVM() macro persuggestion from jian he. > > In addition, I fixed a failure reported from cfbot on FreeBSD build caused by; > > > > WARNING: outfuncs/readfuncs failed to produce an equal rewritten parse tree > > > > This warning was raised since I missed to modify outfuncs.c for a new field. > > I found cfbot on FreeBSD still reported a failure due to > ENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS because the regression test used > wrong role names. Attached is a fixed version, v32. Attached is a rebased version, v33. Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
Attachment
- v33-0011-Add-documentations-about-Incremental-View-Mainte.patch
- v33-0010-Add-regression-tests-for-Incremental-View-Mainte.patch
- v33-0009-Add-support-for-min-max-aggregates-for-IVM.patch
- v33-0008-Add-aggregates-support-in-IVM.patch
- v33-0007-Add-DISTINCT-support-for-IVM.patch
- v33-0006-Add-Incremental-View-Maintenance-support.patch
- v33-0005-Add-Incremental-View-Maintenance-support-to-psql.patch
- v33-0004-Add-Incremental-View-Maintenance-support-to-pg_d.patch
- v33-0003-Allow-to-prolong-life-span-of-transition-tables-.patch
- v33-0002-Add-relisivm-column-to-pg_class-system-catalog.patch
- v33-0001-Add-a-syntax-to-create-Incrementally-Maintainabl.patch
On Tue, 2 Jul 2024 17:03:11 +0900 Yugo NAGATA <nagata@sraoss.co.jp> wrote: > On Sun, 31 Mar 2024 22:59:31 +0900 > Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > > > > > Also, I added a comment on RelationIsIVM() macro persuggestion from jian he. > > > In addition, I fixed a failure reported from cfbot on FreeBSD build caused by; > > > > > > WARNING: outfuncs/readfuncs failed to produce an equal rewritten parse tree > > > > > > This warning was raised since I missed to modify outfuncs.c for a new field. > > > > I found cfbot on FreeBSD still reported a failure due to > > ENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS because the regression test used > > wrong role names. Attached is a fixed version, v32. > > Attached is a rebased version, v33. I updated the patch to bump up the version numbers in psql and pg_dump codes from 17 to 18. Regards, Yugo Nagata > > Regards, > Yugo Nagata > > > -- > Yugo NAGATA <nagata@sraoss.co.jp> -- Yugo NAGATA <nagata@sraoss.co.jp>
Attachment
- v34-0002-Add-relisivm-column-to-pg_class-system-catalog.patch
- v34-0001-Add-a-syntax-to-create-Incrementally-Maintainabl.patch
- v34-0011-Add-documentations-about-Incremental-View-Mainte.patch
- v34-0010-Add-regression-tests-for-Incremental-View-Mainte.patch
- v34-0009-Add-support-for-min-max-aggregates-for-IVM.patch
- v34-0008-Add-aggregates-support-in-IVM.patch
- v34-0007-Add-DISTINCT-support-for-IVM.patch
- v34-0006-Add-Incremental-View-Maintenance-support.patch
- v34-0005-Add-Incremental-View-Maintenance-support-to-psql.patch
- v34-0004-Add-Incremental-View-Maintenance-support-to-pg_d.patch
- v34-0003-Allow-to-prolong-life-span-of-transition-tables-.patch
Hi! Cloudberry DB (Greenplum fork) uses IMMV feature for AQUMV (auto query use matview) feature, so i got interested in how it is implemented. On Thu, 11 Jul 2024 at 09:24, Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > I updated the patch to bump up the version numbers in psql and pg_dump codes > from 17 to 18. Few suggestions: 1) `Add-relisivm-column-to-pg_class-system-catalog` commit message should be fixed, there is "isimmv" in the last line. 2) I dont get why `Add-Incremental-View-Maintenance-support.patch` goes after 0005 & 0004. Shoulndt we first implement feature server side, only when client (psql & pg_dump) side? 3) Can we provide regression tests for each function separately? Test for main feature in main patch, test for DISTINCT support in v34-0007-Add-DISTINCT-support-for-IVM.patch etc? This way the patchset will be easier to review, and can be committed separelety. 4) v34-0006-Add-Incremental-View-Maintenance-support.patch no longer applies due to 4b74ebf726d444ba820830cad986a1f92f724649. After resolving issues manually, it does not compile, because 4b74ebf726d444ba820830cad986a1f92f724649 also removes save_userid/save_sec_context fields from ExecCreateTableAs. > if (RelationIsIVM(matviewRel) && stmt->skipData) Now this function accepts skipData param. 5) For DISTINCT support patch uses hidden __ivm* columns. Is this design discussed anywhere? I wonder if this is a necessity (only solution) or if there are alternatives. 6) What are the caveats of supporting some simple cases for aggregation funcs like in example? ``` regress=# CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_2 AS SELECT sum(j) + sum(i) from mv_base_a; ERROR: expression containing an aggregate in it is not supported on incrementally maintainable materialized view ``` I can see some difficulties with division CREATE IMMV .... AS SELECT 1/sum(i) from mv_base_a; (sum(i) == 0 case), but adding & multiplication should be ok, aren't they? Overall, patchset looks mature, however it is far from being committable due to lack of testing/feedback/discussion. There is only one way to fix this... Test and discuss it! [1] https://github.com/cloudberrydb/cloudberrydb
On Sat, 27 Jul 2024 at 13:26, Kirill Reshke <reshkekirill@gmail.com> wrote: > > Hi! > Cloudberry DB (Greenplum fork) uses IMMV feature for AQUMV (auto query > use matview) feature, so i got interested in how it is implemented. > > On Thu, 11 Jul 2024 at 09:24, Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > > > I updated the patch to bump up the version numbers in psql and pg_dump codes > > from 17 to 18. > > Few suggestions: > > 1) `Add-relisivm-column-to-pg_class-system-catalog` commit message > should be fixed, there is "isimmv" in the last line. > 2) I dont get why `Add-Incremental-View-Maintenance-support.patch` > goes after 0005 & 0004. Shoulndt we first implement feature server > side, only when client (psql & pg_dump) side? > 3) Can we provide regression tests for each function separately? Test > for main feature in main patch, test for DISTINCT support in > v34-0007-Add-DISTINCT-support-for-IVM.patch etc? This way the patchset > will be easier to review, and can be committed separelety. > 4) v34-0006-Add-Incremental-View-Maintenance-support.patch no longer > applies due to 4b74ebf726d444ba820830cad986a1f92f724649. After > resolving issues manually, it does not compile, because > 4b74ebf726d444ba820830cad986a1f92f724649 also removes > save_userid/save_sec_context fields from ExecCreateTableAs. > > > if (RelationIsIVM(matviewRel) && stmt->skipData) > Now this function accepts skipData param. > > 5) For DISTINCT support patch uses hidden __ivm* columns. Is this > design discussed anywhere? I wonder if this is a necessity (only > solution) or if there are alternatives. > 6) > What are the caveats of supporting some simple cases for aggregation > funcs like in example? > ``` > regress=# CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_2 AS SELECT > sum(j) + sum(i) from mv_base_a; > ERROR: expression containing an aggregate in it is not supported on > incrementally maintainable materialized view > ``` > I can see some difficulties with division CREATE IMMV .... AS SELECT > 1/sum(i) from mv_base_a; (sum(i) == 0 case), but adding & > multiplication should be ok, aren't they? > > > Overall, patchset looks mature, however it is far from being > committable due to lack of testing/feedback/discussion. There is only > one way to fix this... Test and discuss it! > > > [1] https://github.com/cloudberrydb/cloudberrydb Hi! Small update: I tried to run a regression test and all IMMV-related tests failed on my vm. Maybe I'm doing something wrong, I will try to investigate. Another suggestion: support for \d and \d+ commands in psql. With v34 patchset applied, psql does not show anything IMMV-related in \d mode. ``` reshke=# \d m1 Materialized view "public.m1" Column | Type | Collation | Nullable | Default --------+---------+-----------+----------+--------- i | integer | | | Distributed by: (i) reshke=# \d+ m1 Materialized view "public.m1" Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description --------+---------+-----------+----------+---------+---------+-------------+--------------+------------- i | integer | | | | plain | | | View definition: SELECT t1.i FROM t1; Distributed by: (i) Access method: heap ``` Output should be 'Incrementally materialized view "public.m1"' IMO.
Hi, On Tue, 30 Jul 2024 03:32:19 +0500 Kirill Reshke <reshkekirill@gmail.com> wrote: > On Sat, 27 Jul 2024 at 13:26, Kirill Reshke <reshkekirill@gmail.com> wrote: > > > > Hi! > > Cloudberry DB (Greenplum fork) uses IMMV feature for AQUMV (auto query > > use matview) feature, so i got interested in how it is implemented. Thank you so much for a lot of comments! I will respond to the comments soon. > > > > On Thu, 11 Jul 2024 at 09:24, Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > > > > > I updated the patch to bump up the version numbers in psql and pg_dump codes > > > from 17 to 18. > > > > Few suggestions: > > > > 1) `Add-relisivm-column-to-pg_class-system-catalog` commit message > > should be fixed, there is "isimmv" in the last line. > > 2) I dont get why `Add-Incremental-View-Maintenance-support.patch` > > goes after 0005 & 0004. Shoulndt we first implement feature server > > side, only when client (psql & pg_dump) side? > > 3) Can we provide regression tests for each function separately? Test > > for main feature in main patch, test for DISTINCT support in > > v34-0007-Add-DISTINCT-support-for-IVM.patch etc? This way the patchset > > will be easier to review, and can be committed separelety. > > 4) v34-0006-Add-Incremental-View-Maintenance-support.patch no longer > > applies due to 4b74ebf726d444ba820830cad986a1f92f724649. After > > resolving issues manually, it does not compile, because > > 4b74ebf726d444ba820830cad986a1f92f724649 also removes > > save_userid/save_sec_context fields from ExecCreateTableAs. > > > > > if (RelationIsIVM(matviewRel) && stmt->skipData) > > Now this function accepts skipData param. > > > > 5) For DISTINCT support patch uses hidden __ivm* columns. Is this > > design discussed anywhere? I wonder if this is a necessity (only > > solution) or if there are alternatives. > > 6) > > What are the caveats of supporting some simple cases for aggregation > > funcs like in example? > > ``` > > regress=# CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_2 AS SELECT > > sum(j) + sum(i) from mv_base_a; > > ERROR: expression containing an aggregate in it is not supported on > > incrementally maintainable materialized view > > ``` > > I can see some difficulties with division CREATE IMMV .... AS SELECT > > 1/sum(i) from mv_base_a; (sum(i) == 0 case), but adding & > > multiplication should be ok, aren't they? > > > > > > Overall, patchset looks mature, however it is far from being > > committable due to lack of testing/feedback/discussion. There is only > > one way to fix this... Test and discuss it! > > > > > > [1] https://github.com/cloudberrydb/cloudberrydb > > Hi! Small update: I tried to run a regression test and all > IMMV-related tests failed on my vm. Maybe I'm doing something wrong, I > will try to investigate. > > Another suggestion: support for \d and \d+ commands in psql. With v34 > patchset applied, psql does not show anything IMMV-related in \d mode. > > ``` > reshke=# \d m1 > Materialized view "public.m1" > Column | Type | Collation | Nullable | Default > --------+---------+-----------+----------+--------- > i | integer | | | > Distributed by: (i) > > > reshke=# \d+ m1 > Materialized view "public.m1" > Column | Type | Collation | Nullable | Default | Storage | > Compression | Stats target | Description > --------+---------+-----------+----------+---------+---------+-------------+--------------+------------- > i | integer | | | | plain | > | | > View definition: > SELECT t1.i > FROM t1; > Distributed by: (i) > Access method: heap > > ``` > > Output should be 'Incrementally materialized view "public.m1"' IMO. -- Yugo NAGATA <nagata@sraoss.co.jp>
On Tue, 30 Jul 2024 at 03:32, Kirill Reshke <reshkekirill@gmail.com> wrote: > > On Sat, 27 Jul 2024 at 13:26, Kirill Reshke <reshkekirill@gmail.com> wrote: > > > > Hi! > > Cloudberry DB (Greenplum fork) uses IMMV feature for AQUMV (auto query > > use matview) feature, so i got interested in how it is implemented. > > > > On Thu, 11 Jul 2024 at 09:24, Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > > > > > I updated the patch to bump up the version numbers in psql and pg_dump codes > > > from 17 to 18. > > > > Few suggestions: > > > > 1) `Add-relisivm-column-to-pg_class-system-catalog` commit message > > should be fixed, there is "isimmv" in the last line. > > 2) I dont get why `Add-Incremental-View-Maintenance-support.patch` > > goes after 0005 & 0004. Shoulndt we first implement feature server > > side, only when client (psql & pg_dump) side? > > 3) Can we provide regression tests for each function separately? Test > > for main feature in main patch, test for DISTINCT support in > > v34-0007-Add-DISTINCT-support-for-IVM.patch etc? This way the patchset > > will be easier to review, and can be committed separelety. > > 4) v34-0006-Add-Incremental-View-Maintenance-support.patch no longer > > applies due to 4b74ebf726d444ba820830cad986a1f92f724649. After > > resolving issues manually, it does not compile, because > > 4b74ebf726d444ba820830cad986a1f92f724649 also removes > > save_userid/save_sec_context fields from ExecCreateTableAs. > > > > > if (RelationIsIVM(matviewRel) && stmt->skipData) > > Now this function accepts skipData param. > > > > 5) For DISTINCT support patch uses hidden __ivm* columns. Is this > > design discussed anywhere? I wonder if this is a necessity (only > > solution) or if there are alternatives. > > 6) > > What are the caveats of supporting some simple cases for aggregation > > funcs like in example? > > ``` > > regress=# CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_2 AS SELECT > > sum(j) + sum(i) from mv_base_a; > > ERROR: expression containing an aggregate in it is not supported on > > incrementally maintainable materialized view > > ``` > > I can see some difficulties with division CREATE IMMV .... AS SELECT > > 1/sum(i) from mv_base_a; (sum(i) == 0 case), but adding & > > multiplication should be ok, aren't they? > > > > > > Overall, patchset looks mature, however it is far from being > > committable due to lack of testing/feedback/discussion. There is only > > one way to fix this... Test and discuss it! > > > > > > [1] https://github.com/cloudberrydb/cloudberrydb > > Hi! Small update: I tried to run a regression test and all > IMMV-related tests failed on my vm. Maybe I'm doing something wrong, I > will try to investigate. > > Another suggestion: support for \d and \d+ commands in psql. With v34 > patchset applied, psql does not show anything IMMV-related in \d mode. > > ``` > reshke=# \d m1 > Materialized view "public.m1" > Column | Type | Collation | Nullable | Default > --------+---------+-----------+----------+--------- > i | integer | | | > Distributed by: (i) > > > reshke=# \d+ m1 > Materialized view "public.m1" > Column | Type | Collation | Nullable | Default | Storage | > Compression | Stats target | Description > --------+---------+-----------+----------+---------+---------+-------------+--------------+------------- > i | integer | | | | plain | > | | > View definition: > SELECT t1.i > FROM t1; > Distributed by: (i) > Access method: heap > > ``` > > Output should be 'Incrementally materialized view "public.m1"' IMO. And one more thing, noticed today while playing with patchset: I believe non-terminal incremental should be OptIncremental Im talking about this: ``` incremental: INCREMENTAL { $$ = true; } | /*EMPTY*/ { $$ = false; } ; ```
On Thu, 11 Jul 2024 at 09:24, Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > On Tue, 2 Jul 2024 17:03:11 +0900 > Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > > On Sun, 31 Mar 2024 22:59:31 +0900 > > Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > > > > > > > Also, I added a comment on RelationIsIVM() macro persuggestion from jian he. > > > > In addition, I fixed a failure reported from cfbot on FreeBSD build caused by; > > > > > > > > WARNING: outfuncs/readfuncs failed to produce an equal rewritten parse tree > > > > > > > > This warning was raised since I missed to modify outfuncs.c for a new field. > > > > > > I found cfbot on FreeBSD still reported a failure due to > > > ENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS because the regression test used > > > wrong role names. Attached is a fixed version, v32. > > > > Attached is a rebased version, v33. > > I updated the patch to bump up the version numbers in psql and pg_dump codes > from 17 to 18. > > Regards, > Yugo Nagata > > > > > Regards, > > Yugo Nagata > > > > > > -- > > Yugo NAGATA <nagata@sraoss.co.jp> > > > -- > Yugo NAGATA <nagata@sraoss.co.jp> Small updates with something o found recent days: ``` db2=# create incremental materialized view v2 as select * from v1; ERROR: VIEW or MATERIALIZED VIEW is not supported on incrementally maintainable materialized view ``` Error messaging is not true, create view v2 as select * from v1; works fine. ``` db2=# create incremental materialized view vv2 as select i,j2, i / j2 from t1 join t2 on true; db2=# insert into t2 values(1,0); ERROR: division by zero ``` It is very strange to receive `division by zero` while inserting into relation, isn't it? Can we add some hints/CONTEXT here? Regular triggers do it: ``` db2=# insert into ttt values(100000,0); ERROR: division by zero CONTEXT: PL/pgSQL function f1() line 3 at IF ``` -- Best regards, Kirill Reshke
I am really sorry for splitting my review comments into multiple emails. I'll try to do a better review in a future, all-in-one. On Thu, 11 Jul 2024 at 09:24, Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > On Tue, 2 Jul 2024 17:03:11 +0900 > Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > > On Sun, 31 Mar 2024 22:59:31 +0900 > > Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > > > > > > > Also, I added a comment on RelationIsIVM() macro persuggestion from jian he. > > > > In addition, I fixed a failure reported from cfbot on FreeBSD build caused by; > > > > > > > > WARNING: outfuncs/readfuncs failed to produce an equal rewritten parse tree > > > > > > > > This warning was raised since I missed to modify outfuncs.c for a new field. > > > > > > I found cfbot on FreeBSD still reported a failure due to > > > ENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS because the regression test used > > > wrong role names. Attached is a fixed version, v32. > > > > Attached is a rebased version, v33. > > I updated the patch to bump up the version numbers in psql and pg_dump codes > from 17 to 18. > > Regards, > Yugo Nagata > > > > > Regards, > > Yugo Nagata > > > > > > -- > > Yugo NAGATA <nagata@sraoss.co.jp> > > > -- > Yugo NAGATA <nagata@sraoss.co.jp> 1) Provided patches do not set process title correctly: ``` reshke 2602433 18.7 0.1 203012 39760 ? Rs 20:41 1:58 postgres: reshke ivm [local] CREATE MATERIALIZED VIEW ``` 2) We allow to REFRESH IMMV. Why? IMMV should be always up to date. Well, I can see that this utility command may be useful in case of corruption of some base relation/view itself, so there will be a need to rebuild the whole from scratch. But we already have VACUUM FULL for this, aren't we? 3) Triggers created for IMMV are not listed via \dS [tablename] 4) apply_old_delta_with_count executes non-trivial SQL statements for IMMV. It would be really helpful to see this in EXPLAIN ANALYZE. 5) > + "DELETE FROM %s WHERE ctid IN (" > + "SELECT tid FROM (SELECT pg_catalog.row_number() over (partition by %s) AS \"__ivm_row_number__\"," > + "mv.ctid AS tid," > + "diff.\"__ivm_count__\"" > + "FROM %s AS mv, %s AS diff " > + "WHERE %s) v " > + "WHERE v.\"__ivm_row_number__\" OPERATOR(pg_catalog.<=) v.\"__ivm_count__\")", > + matviewname, > + keysbuf.data, > + matviewname, deltaname_old, > + match_cond); `SELECT pg_catalog.row_number()` is too generic to my taste. Maybe pg_catalog.immv_row_number() / pg_catalog.get_immv_row_number() ? 6) > +static void > +apply_new_delta(const char *matviewname, const char *deltaname_new, > + StringInfo target_list) > +{ > + StringInfoData querybuf; >+ > + /* Search for matching tuples from the view and update or delete if found. */ Is this comment correct? we only insert tuples here? 7) During patch development, one should pick OIDs from range 8000-9999 > +# IVM > +{ oid => '786', descr => 'ivm trigger (before)', > + proname => 'IVM_immediate_before', provolatile => 'v', prorettype => 'trigger', > + proargtypes => '', prosrc => 'IVM_immediate_before' }, > +{ oid => '787', descr => 'ivm trigger (after)', > + proname => 'IVM_immediate_maintenance', provolatile => 'v', prorettype => 'trigger', > + proargtypes => '', prosrc => 'IVM_immediate_maintenance' }, > +{ oid => '788', descr => 'ivm filetring ', > + proname => 'ivm_visible_in_prestate', provolatile => 's', prorettype => 'bool', > + proargtypes => 'oid tid oid', prosrc => 'ivm_visible_in_prestate' }, > ] -- Best regards, Kirill Reshke
On Wed, 31 May 2023 at 20:14, Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > Hello hackers, > > Here's a rebased version of the patch-set adding Incremental View > Maintenance support for PostgreSQL. That was discussed in [1]. > > The patch-set consists of the following eleven patches. > > - 0001: Add a syntax to create Incrementally Maintainable Materialized Views > - 0002: Add relisivm column to pg_class system catalog > - 0003: Allow to prolong life span of transition tables until transaction end > - 0004: Add Incremental View Maintenance support to pg_dum > - 0005: Add Incremental View Maintenance support to psql > - 0006: Add Incremental View Maintenance support > - 0007: Add DISTINCT support for IVM > - 0008: Add aggregates support in IVM > - 0009: Add support for min/max aggregates for IVM > - 0010: regression tests > - 0011: documentation > > [1] https://www.postgresql.org/message-id/flat/20181227215726.4d166b4874f8983a641123f5%40sraoss.co.jp > > > Regards, > Yugo Nagata > > -- > Yugo NAGATA <nagata@sraoss.co.jp> Actually, this new MV delta-table calculation can be used to make faster REFRESH MATERIALIZED VIEW even for non-IMMV. Specifically, we can use our cost-based Optimizer to decide which way is cheaper: regular query execution, or delta-table approach (if it is applicable). Is it worth another thread? -- Best regards, Kirill Reshke
On Tue, 30 Jul 2024 at 10:24, Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > Hi, > > On Tue, 30 Jul 2024 03:32:19 +0500 > Kirill Reshke <reshkekirill@gmail.com> wrote: > > > On Sat, 27 Jul 2024 at 13:26, Kirill Reshke <reshkekirill@gmail.com> wrote: > > > > > > Hi! > > > Cloudberry DB (Greenplum fork) uses IMMV feature for AQUMV (auto query > > > use matview) feature, so i got interested in how it is implemented. > > Thank you so much for a lot of comments! > I will respond to the comments soon. > > > > > > > On Thu, 11 Jul 2024 at 09:24, Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > > > > > > > I updated the patch to bump up the version numbers in psql and pg_dump codes > > > > from 17 to 18. > > > > > > Few suggestions: > > > > > > 1) `Add-relisivm-column-to-pg_class-system-catalog` commit message > > > should be fixed, there is "isimmv" in the last line. > > > 2) I dont get why `Add-Incremental-View-Maintenance-support.patch` > > > goes after 0005 & 0004. Shoulndt we first implement feature server > > > side, only when client (psql & pg_dump) side? > > > 3) Can we provide regression tests for each function separately? Test > > > for main feature in main patch, test for DISTINCT support in > > > v34-0007-Add-DISTINCT-support-for-IVM.patch etc? This way the patchset > > > will be easier to review, and can be committed separelety. > > > 4) v34-0006-Add-Incremental-View-Maintenance-support.patch no longer > > > applies due to 4b74ebf726d444ba820830cad986a1f92f724649. After > > > resolving issues manually, it does not compile, because > > > 4b74ebf726d444ba820830cad986a1f92f724649 also removes > > > save_userid/save_sec_context fields from ExecCreateTableAs. > > > > > > > if (RelationIsIVM(matviewRel) && stmt->skipData) > > > Now this function accepts skipData param. > > > > > > 5) For DISTINCT support patch uses hidden __ivm* columns. Is this > > > design discussed anywhere? I wonder if this is a necessity (only > > > solution) or if there are alternatives. > > > 6) > > > What are the caveats of supporting some simple cases for aggregation > > > funcs like in example? > > > ``` > > > regress=# CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_2 AS SELECT > > > sum(j) + sum(i) from mv_base_a; > > > ERROR: expression containing an aggregate in it is not supported on > > > incrementally maintainable materialized view > > > ``` > > > I can see some difficulties with division CREATE IMMV .... AS SELECT > > > 1/sum(i) from mv_base_a; (sum(i) == 0 case), but adding & > > > multiplication should be ok, aren't they? > > > > > > > > > Overall, patchset looks mature, however it is far from being > > > committable due to lack of testing/feedback/discussion. There is only > > > one way to fix this... Test and discuss it! > > > > > > > > > [1] https://github.com/cloudberrydb/cloudberrydb > > > > Hi! Small update: I tried to run a regression test and all > > IMMV-related tests failed on my vm. Maybe I'm doing something wrong, I > > will try to investigate. > > > > Another suggestion: support for \d and \d+ commands in psql. With v34 > > patchset applied, psql does not show anything IMMV-related in \d mode. > > > > ``` > > reshke=# \d m1 > > Materialized view "public.m1" > > Column | Type | Collation | Nullable | Default > > --------+---------+-----------+----------+--------- > > i | integer | | | > > Distributed by: (i) > > > > > > reshke=# \d+ m1 > > Materialized view "public.m1" > > Column | Type | Collation | Nullable | Default | Storage | > > Compression | Stats target | Description > > --------+---------+-----------+----------+---------+---------+-------------+--------------+------------- > > i | integer | | | | plain | > > | | > > View definition: > > SELECT t1.i > > FROM t1; > > Distributed by: (i) > > Access method: heap > > > > ``` > > > > Output should be 'Incrementally materialized view "public.m1"' IMO. > > > -- > Yugo NAGATA <nagata@sraoss.co.jp> So, I spent another 2 weeks on this patch. I have read the whole 'Incremental View Maintenance' thread (from 2018), this thread, some related threads. Have studied some papers on this topic. I got a better understanding of the theory this work is backed up with. However, I still can add my 2c. == Major suggestions. 1) At first glance, working with this IVM/IMMV infrastructure feels really unintuitive about what servers actually do for query execution. I do think It will be much better for user experience to add more EXPLAIN about IVM work done inside IVM triggers. This way it is much clearer which part is working slow, so which index should be created, etc. 2) The kernel code for IVM lacks possibility to be extended for further IVM optimizations. The one example is foreign key optimization described here[1]. I'm not saying we should implement this within this patchset, but we surely should pave the way for this. I don't have any good suggestions for how to do this though. 3) I don't really think SQL design is good. CREATE [INCREMENTAL] M.V. is too ad-hoc. I would prefer CREATE M.V. with (maintain_incr=true). (reloption name is just an example). This way we can change regular M.V. to IVM and vice versa via ALTER M.V. SET *reloptions* - a type of syntax that is already present in PostgreSQL core. == Other thoughts In OLAP databases (see [2]), IVM opens the door for 'view exploitation' feature. That is, use IVM (which is always up-to-date) for query execution. But current IVM implementation is not compatible with Cloudberry Append-optimized Table Access Method. The problem is the 'table_tuple_fetch_row_version' call, which is used by ivm_visible_in_prestate to check tuple visibility within a snapshot. I am trying to solve this somehow. My current idea is the following: multiple base table modification via single statement along with tuple deletion from base tables are features. We can error-out these cases (at M.V. creation time) all for some TAMs, and support only insert & truncate. However, I don't know how to check if TAM supports 'tuple_fetch_row_version' other than calling it and receiving ERROR[3]. == Minor nitpicks and typos. reshke=# insert into tt select * from generate_series(1, 1000090); ^CCancel request sent ERROR: canceling statement due to user request CONTEXT: SQL statement "INSERT INTO public.mv1 (i, j) SELECT i, j FROM (SELECT diff.*, pg_catalog.generate_series(1, diff."__ivm_count__") AS __ivm_generate_series__ FROM new_delta AS diff) AS v" Time: 18883.883 ms (00:18.884) This is very surprising, isn't it? We can set HINT here, to indicate where this query comes from. 2) deleted/deleted -> updated/deleted + /* + * XXX: When using DELETE or UPDATE, we must use exclusive lock for now + * because apply_old_delta(_with_count) uses ctid to identify the tuple + * to be deleted/deleted, but doesn't work in concurrent situations. 3) Typo in rewrite_query_for_postupdate_state: /* Retore the original RTE */ 4) in apply_delta function has exactly one usage, so the 'use_count' param is redundant, because we already pass the 'query' param, and 'use_count' is calculated from the 'query'. 5) in calc_delta: > ListCell *lc = list_nth_cell(query->rtable, rte_index - 1); > RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc); Should we add Assert(list_lenght(lc) == 1) here? Can there be multiple items in this list? 6) In get_prestate_rte: > appendStringInfo(&str, > "SELECT t.* FROM %s t" > " WHERE pg_catalog.ivm_visible_in_prestate(t.tableoid, t.ctid ,%d::pg_catalog.oid)", > relname, matviewid); Identitation issue. This will not be fixed via pg_ident run, because this is str contant, so better so fix it by-hand. 7) in apply_new_delta_with_count: > appendStringInfo(&querybuf, > "WITH updt AS (" /* update a tuple if this exists in the view */ > "UPDATE %s AS mv SET %s = mv.%s OPERATOR(pg_catalog.+) diff.%s " SET % OPERATOR(pg_catalog.=) mv.%s ? same for append_set_clause_for_count, append_set_clause_for_sum, append_set_clause_for_minmax > /* avg = (mv.sum - t.sum)::aggtype / (mv.count - t.count) */ > appendStringInfo(buf_old, > ", %s = %s OPERATOR(pg_catalog./) %s", should be /* avg OPERATOR(pg_catalog.=) (mv.sum - t.sum)::aggtype / (mv.count - t.count) */ appendStringInfo(buf_old, ", %s = %s OPERATOR(pg_catalog./) %s", [1] https://assets.amazon.science/a2/57/a00ebcfc446a9d0bf827bb51c15a/foreign-keys-open-the-door-for-faster-incremental-view-maintenance.pdf [2] https://github.com/cloudberrydb/cloudberrydb [3] https://github.com/cloudberrydb/cloudberrydb/blob/b9aec75154d5bbecce7ce3a33e8bb2272ff61511/src/backend/access/appendonly/appendonlyam_handler.c#L828 -- Best regards, Kirill Reshke
On Tue, 20 Aug 2024 at 02:14, Kirill Reshke <reshkekirill@gmail.com> wrote: > == Other thoughts > > In OLAP databases (see [2]), IVM opens the door for 'view > exploitation' feature. That is, use IVM (which is always up-to-date) > for query execution. But current IVM implementation is not compatible > with Cloudberry Append-optimized Table Access Method. The problem is > the 'table_tuple_fetch_row_version' call, which is used by > ivm_visible_in_prestate to check tuple visibility within a snapshot. I > am trying to solve this somehow. My current idea is the following: > multiple base table modification via single statement along with tuple > deletion from base tables are features. We can error-out these cases > (at M.V. creation time) all for some TAMs, and support only insert & > truncate. However, I don't know how to check if TAM supports > 'tuple_fetch_row_version' other than calling it and receiving > ERROR[3]. > I reread this and I find this a little bit unclear. What I'm proposing here is specifying the type of operations IVM supports on creation time. So, one can run CREATE IVM immv1 WITH (support_deletion = true/false, support_multiple_relation_change = true/false). Then, in the query execution time, we just ERROR if the query leads to deletion from IVM and support_deletion if false. -- Best regards, Kirill Reshke
On Tue, 20 Aug 2024 at 02:14, Kirill Reshke <reshkekirill@gmail.com> wrote: > > > == Major suggestions. > > 1) At first glance, working with this IVM/IMMV infrastructure feels > really unintuitive about what servers actually do for query execution. > I do think It will be much better for user experience to add more > EXPLAIN about IVM work done inside IVM triggers. This way it is much > clearer which part is working slow, so which index should be created, > etc. > > 2) The kernel code for IVM lacks possibility to be extended for > further IVM optimizations. The one example is foreign key optimization > described here[1]. I'm not saying we should implement this within this > patchset, but we surely should pave the way for this. I don't have any > good suggestions for how to do this though. > > 3) I don't really think SQL design is good. CREATE [INCREMENTAL] M.V. > is too ad-hoc. I would prefer CREATE M.V. with (maintain_incr=true). > (reloption name is just an example). > This way we can change regular M.V. to IVM and vice versa via ALTER > M.V. SET *reloptions* - a type of syntax that is already present in > PostgreSQL core. > One little follow-up here. Why do we do prepstate visibility the way it is done? Can we instead export the snapshot in BEFORE trigger, save it somewhere and use it after? -- Best regards, Kirill Reshke