Re: Implementing Incremental View Maintenance - Mailing list pgsql-hackers
From | Yugo NAGATA |
---|---|
Subject | Re: Implementing Incremental View Maintenance |
Date | |
Msg-id | 20200219091135.79df6f6c63218fbff4b32d5c@sraoss.co.jp Whole thread Raw |
In response to | Re: Implementing Incremental View Maintenance (nuko yokohama <nuko.yokohama@gmail.com>) |
List | pgsql-hackers |
On Tue, 18 Feb 2020 22:03:47 +0900 nuko yokohama <nuko.yokohama@gmail.com> wrote: > Hi. > > SELECT statements with a TABLESAMPLE clause should be rejected. > > Currently, CREATE INCREMENTAL MATERIALIZED VIEW allows SELECT statements > with the TABLESAMPLE clause. > However, the result of this SELECT statement is undefined and should be > rejected when specified in CREATE INCREMENTAL MATERIALIZED VIEW. > (similar to handling non-immutable functions) Thanks! We totally agree with you. We are now working on improvement of query checks at creating IMMV. TABLESAMPLE will also be checked in this. Regards, Yugo Nagata > Regard. > > 2020年2月8日(土) 11:15 nuko yokohama <nuko.yokohama@gmail.com>: > > > Hi. > > > > UNION query problem.(server crash) > > > > When creating an INCREMENTAL MATERIALIZED VIEW, > > the server process crashes if you specify a query with a UNION. > > > > (commit id = 23151be7be8d8f8f9c35c2d0e4e5353aedf2b31e) > > > > execute log. > > > > ``` > > [ec2-user@ip-10-0-1-10 ivm]$ psql testdb -e -f union_query_crash.sql > > DROP TABLE IF EXISTS table_x CASCADE; > > psql:union_query_crash.sql:6: NOTICE: drop cascades to view xy_union_v > > DROP TABLE > > DROP TABLE IF EXISTS table_y CASCADE; > > DROP TABLE > > CREATE TABLE table_x (id int, data numeric); > > CREATE TABLE > > CREATE TABLE table_y (id int, data numeric); > > CREATE TABLE > > INSERT INTO table_x VALUES (generate_series(1, 3), random()::numeric); > > INSERT 0 3 > > INSERT INTO table_y VALUES (generate_series(1, 3), random()::numeric); > > INSERT 0 3 > > SELECT * FROM table_x; > > id | data > > ----+-------------------- > > 1 | 0.950724735058774 > > 2 | 0.0222670808201144 > > 3 | 0.391258547114841 > > (3 rows) > > > > SELECT * FROM table_y; > > id | data > > ----+-------------------- > > 1 | 0.991717347778337 > > 2 | 0.0528458947672874 > > 3 | 0.965044982911163 > > (3 rows) > > > > CREATE VIEW xy_union_v AS > > SELECT 'table_x' AS name, * FROM table_x > > UNION > > SELECT 'table_y' AS name, * FROM table_y > > ; > > CREATE VIEW > > TABLE xy_union_v; > > name | id | data > > ---------+----+-------------------- > > table_y | 2 | 0.0528458947672874 > > table_x | 2 | 0.0222670808201144 > > table_y | 3 | 0.965044982911163 > > table_x | 1 | 0.950724735058774 > > table_x | 3 | 0.391258547114841 > > table_y | 1 | 0.991717347778337 > > (6 rows) > > > > CREATE INCREMENTAL MATERIALIZED VIEW xy_imv AS > > SELECT 'table_x' AS name, * FROM table_x > > UNION > > SELECT 'table_y' AS name, * FROM table_y > > ; > > psql:union_query_crash.sql:28: server closed the connection unexpectedly > > This probably means the server terminated abnormally > > before or while processing the request. > > psql:union_query_crash.sql:28: fatal: connection to server was lost > > ``` > > UNION query problem.(server crash) > > > > When creating an INCREMENTAL MATERIALIZED VIEW, > > the server process crashes if you specify a query with a UNION. > > > > (commit id = 23151be7be8d8f8f9c35c2d0e4e5353aedf2b31e) > > > > execute log. > > > > ``` > > [ec2-user@ip-10-0-1-10 ivm]$ psql testdb -e -f union_query_crash.sql > > DROP TABLE IF EXISTS table_x CASCADE; > > psql:union_query_crash.sql:6: NOTICE: drop cascades to view xy_union_v > > DROP TABLE > > DROP TABLE IF EXISTS table_y CASCADE; > > DROP TABLE > > CREATE TABLE table_x (id int, data numeric); > > CREATE TABLE > > CREATE TABLE table_y (id int, data numeric); > > CREATE TABLE > > INSERT INTO table_x VALUES (generate_series(1, 3), random()::numeric); > > INSERT 0 3 > > INSERT INTO table_y VALUES (generate_series(1, 3), random()::numeric); > > INSERT 0 3 > > SELECT * FROM table_x; > > id | data > > ----+-------------------- > > 1 | 0.950724735058774 > > 2 | 0.0222670808201144 > > 3 | 0.391258547114841 > > (3 rows) > > > > SELECT * FROM table_y; > > id | data > > ----+-------------------- > > 1 | 0.991717347778337 > > 2 | 0.0528458947672874 > > 3 | 0.965044982911163 > > (3 rows) > > > > CREATE VIEW xy_union_v AS > > SELECT 'table_x' AS name, * FROM table_x > > UNION > > SELECT 'table_y' AS name, * FROM table_y > > ; > > CREATE VIEW > > TABLE xy_union_v; > > name | id | data > > ---------+----+-------------------- > > table_y | 2 | 0.0528458947672874 > > table_x | 2 | 0.0222670808201144 > > table_y | 3 | 0.965044982911163 > > table_x | 1 | 0.950724735058774 > > table_x | 3 | 0.391258547114841 > > table_y | 1 | 0.991717347778337 > > (6 rows) > > > > CREATE INCREMENTAL MATERIALIZED VIEW xy_imv AS > > SELECT 'table_x' AS name, * FROM table_x > > UNION > > SELECT 'table_y' AS name, * FROM table_y > > ; > > psql:union_query_crash.sql:28: server closed the connection unexpectedly > > This probably means the server terminated abnormally > > before or while processing the request. > > psql:union_query_crash.sql:28: fatal: connection to server was lost > > ``` > > > > 2018年12月27日(木) 21:57 Yugo Nagata <nagata@sraoss.co.jp>: > > > >> Hi, > >> > >> I would like to implement Incremental View Maintenance (IVM) on > >> PostgreSQL. > >> IVM is a technique to maintain materialized views which computes and > >> applies > >> only the incremental changes to the materialized views rather than > >> recomputate the contents as the current REFRESH command does. > >> > >> I had a presentation on our PoC implementation of IVM at PGConf.eu 2018 > >> [1]. > >> Our implementation uses row OIDs to compute deltas for materialized > >> views. > >> The basic idea is that if we have information about which rows in base > >> tables > >> are contributing to generate a certain row in a matview then we can > >> identify > >> the affected rows when a base table is updated. This is based on an idea > >> of > >> Dr. Masunaga [2] who is a member of our group and inspired from ID-based > >> approach[3]. > >> > >> In our implementation, the mapping of the row OIDs of the materialized > >> view > >> and the base tables are stored in "OID map". When a base relation is > >> modified, > >> AFTER trigger is executed and the delta is recorded in delta tables using > >> the transition table feature. The accual udpate of the matview is triggerd > >> by REFRESH command with INCREMENTALLY option. > >> > >> However, we realize problems of our implementation. First, WITH OIDS will > >> be removed since PG12, so OIDs are no longer available. Besides this, it > >> would > >> be hard to implement this since it needs many changes of executor nodes to > >> collect base tables's OIDs during execuing a query. Also, the cost of > >> maintaining > >> OID map would be high. > >> > >> For these reasons, we started to think to implement IVM without relying > >> on OIDs > >> and made a bit more surveys. > >> > >> We also looked at Kevin Grittner's discussion [4] on incremental matview > >> maintenance. In this discussion, Kevin proposed to use counting > >> algorithm [5] > >> to handle projection views (using DISTNICT) properly. This algorithm need > >> an > >> additional system column, count_t, in materialized views and delta tables > >> of > >> base tables. > >> > >> However, the discussion about IVM is now stoped, so we would like to > >> restart and > >> progress this. > >> > >> > >> Through our PoC inplementation and surveys, I think we need to think at > >> least > >> the followings for implementing IVM. > >> > >> 1. How to extract changes on base tables > >> > >> I think there would be at least two approaches for it. > >> > >> - Using transition table in AFTER triggers > >> - Extracting changes from WAL using logical decoding > >> > >> In our PoC implementation, we used AFTER trigger and transition tables, > >> but using > >> logical decoding might be better from the point of performance of base > >> table > >> modification. > >> > >> If we can represent a change of UPDATE on a base table as query-like > >> rather than > >> OLD and NEW, it may be possible to update the materialized view directly > >> instead > >> of performing delete & insert. > >> > >> > >> 2. How to compute the delta to be applied to materialized views > >> > >> Essentially, IVM is based on relational algebra. Theorically, changes on > >> base > >> tables are represented as deltas on this, like "R <- R + dR", and the > >> delta on > >> the materialized view is computed using base table deltas based on "change > >> propagation equations". For implementation, we have to derive the > >> equation from > >> the view definition query (Query tree, or Plan tree?) and describe this > >> as SQL > >> query to compulte delta to be applied to the materialized view. > >> > >> There could be several operations for view definition: selection, > >> projection, > >> join, aggregation, union, difference, intersection, etc. If we can > >> prepare a > >> module for each operation, it makes IVM extensable, so we can start a > >> simple > >> view definition, and then support more complex views. > >> > >> > >> 3. How to identify rows to be modifed in materialized views > >> > >> When applying the delta to the materialized view, we have to identify > >> which row > >> in the matview is corresponding to a row in the delta. A naive method is > >> matching > >> by using all columns in a tuple, but clearly this is unefficient. If > >> thematerialized > >> view has unique index, we can use this. Maybe, we have to force > >> materialized views > >> to have all primary key colums in their base tables. In our PoC > >> implementation, we > >> used OID to identify rows, but this will be no longer available as said > >> above. > >> > >> > >> 4. When to maintain materialized views > >> > >> There are two candidates of the timing of maintenance, immediate (eager) > >> or deferred. > >> > >> In eager maintenance, the materialized view is updated in the same > >> transaction > >> where the base table is updated. In deferred maintenance, this is done > >> after the > >> transaction is commited, for example, when view is accessed, as a > >> response to user > >> request, etc. > >> > >> In the previous discussion[4], it is planned to start from "eager" > >> approach. In our PoC > >> implementaion, we used the other aproach, that is, using REFRESH command > >> to perform IVM. > >> I am not sure which is better as a start point, but I begin to think that > >> the eager > >> approach may be more simple since we don't have to maintain base table > >> changes in other > >> past transactions. > >> > >> In the eager maintenance approache, we have to consider a race condition > >> where two > >> different transactions change base tables simultaneously as discussed in > >> [4]. > >> > >> > >> [1] > >> https://www.postgresql.eu/events/pgconfeu2018/schedule/session/2195-implementing-incremental-view-maintenance-on-postgresql/ > >> [2] > >> https://ipsj.ixsq.nii.ac.jp/ej/index.php?active_action=repository_view_main_item_detail&page_id=13&block_id=8&item_id=191254&item_no=1 > >> (Japanese only) > >> [3] https://dl.acm.org/citation.cfm?id=2750546 > >> [4] > >> https://www.postgresql.org/message-id/flat/1368561126.64093.YahooMailNeo%40web162904.mail.bf1.yahoo.com > >> [5] https://dl.acm.org/citation.cfm?id=170066 > >> > >> Regards, > >> -- > >> Yugo Nagata <nagata@sraoss.co.jp> > >> > >> -- Yugo NAGATA <nagata@sraoss.co.jp>
pgsql-hackers by date: