Re: record identical operator - Mailing list pgsql-hackers
From | Kevin Grittner |
---|---|
Subject | Re: record identical operator |
Date | |
Msg-id | 1379183152.69958.YahooMailNeo@web162901.mail.bf1.yahoo.com Whole thread Raw |
In response to | Re: record identical operator (Andres Freund <andres@2ndquadrant.com>) |
Responses |
Re: record identical operator
|
List | pgsql-hackers |
Andres Freund <andres@2ndquadrant.com> wrote: > what I am talking about is that > e.g.: SELECT (ARRAY[1,2,3,NULL])[1:3] = ARRAY[1,2,3]; > obviously should be true. The patch does not change the behavior of the = operator for any type under any circumstances. > But both arrays don't have the same binary representation since > the former has a null bitmap, the latter not. So, if you had a > composite type like (int4[]) and would compare that without > invoking operators you'd return something false in some cases > because of the null bitmaps. Not for the = operator. The new "identical" operator would find them to not be identical, though. Since the new operator is only for the record type, I need to wrap the values in your example: test=# SELECT row ((ARRAY[1,2,3,NULL])[1:3])::record test-# = row (ARRAY[1,2,3])::record; ?column? ---------- t (1 row) test=# SELECT row ((ARRAY[1,2,3,NULL])[1:3])::record test-# === row (ARRAY[1,2,3])::record; ?column? ---------- f (1 row) Or, to borrow from the citext example using arrays: test=# CREATE TABLE array_table ( test(# id serial primary key, test(# nums int4[] test(# ); CREATE TABLE test=# INSERT INTO array_table (nums) test-# VALUES ((ARRAY[1,2,3,NULL])[1:3]), (ARRAY[1,2,3]), test-# (ARRAY[1,2,3]), (NULL), (NULL); INSERT 0 5 test=# CREATE MATERIALIZED VIEW array_matview AS test-# SELECT * FROM array_table; SELECT 5 test=# CREATE UNIQUE INDEX array_matview_id test-# ON array_matview (id); CREATE INDEX test=# select * from array_matview; id | nums ----+--------- 1 | {1,2,3} 2 | {1,2,3} 3 | {1,2,3} 4 | 5 | (5 rows) Note that the on-disk representation of the row where id = 1 differs from the on-disk representation where id in (2,3), both in the table and the matview. test=# SELECT * test-# FROM array_matview m test-# FULL JOIN array_table t ON (t.id = m.id AND t === m) test-# WHERE t.id IS NULL OR m.id IS NULL; id | nums | id | nums ----+------+----+------ (0 rows) ... so the query looking for work for the RMVC statement finds nothing to do. test=# UPDATE array_table SET nums = (ARRAY[1,2,3,NULL])[1:3] test-# WHERE id between 1 and 2; UPDATE 2 Now we have added an unnecessary bitmap to the on-disk storage of the value where id = 2. test=# SELECT * test-# FROM array_matview m test-# FULL JOIN array_table t ON (t.id = m.id AND t === m) test-# WHERE t.id IS NULL OR m.id IS NULL; id | nums | id | nums ----+---------+----+--------- | | 2 | {1,2,3} 2 | {1,2,3} | | (2 rows) ... and the query sees that they differ. test=# REFRESH MATERIALIZED VIEW CONCURRENTLY array_matview; REFRESH MATERIALIZED VIEW test=# SELECT * test-# FROM array_matview m test-# FULL JOIN array_table t ON (t.id = m.id AND t === m) test-# WHERE t.id IS NULL OR m.id IS NULL; id | nums | id | nums ----+------+----+------ (0 rows) The REFRESH causes them to match again, and later REFRESH runs won't see a need to do any work there unless the on-disk representation changes again. As far as I can see, we have four choices: (1) Never update values that are "equal", even if they appear different to the users, as was demonstrated with the citext example. (2) Require every data type which can be used in a matview to implement some new operator or function for "identical". Perhaps that could be mitigated to only implementat it if equal values can have user-visible differences. (3) Embed special cases into record identical tests for types known to allow multiple on-disk representations which have no user-visible differences. (4) Base the need to update a matview column on whether its on-disk representation is identical to what a new run of the defining query would generate. If this causes performance problems for use of a given type in a matview, one possible solution would be to modify that particular type to use a canonical format when storing a value into a record. For example, storing an array which has a bitmap of null values even though there are no nulls in the array could strip the bitmap as it is stored to the record. Currently we are using (4). I only included (3) for completeness; even just typing it as a hypothetical made me want to take a shower. (1) seems pretty horrid, too. (2) isn't evil, exactly, but any types which allowed user-visible differences in equal values would not exhibit correct behavior in matviews until and unless an "identical" operator was added. It might also perform noticeably worse than (4). Option (4) has the advantage of showing correct logical behavior in all types immediately, and restricting any performance fixes to the types with the inconsistent storage formats. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: