Re: record identical operator - Mailing list pgsql-hackers
From | Kevin Grittner |
---|---|
Subject | Re: record identical operator |
Date | |
Msg-id | 1379966158.9393.YahooMailNeo@web162906.mail.bf1.yahoo.com Whole thread Raw |
In response to | Re: record identical operator (Stephen Frost <sfrost@snowman.net>) |
Responses |
Re: record identical operator
Re: record identical operator |
List | pgsql-hackers |
Stephen Frost <sfrost@snowman.net> wrote: > I'm trying to explain that using that methodology is what landed > us in this situation to begin with. I'm trying to figure out what situation you think we're in. Seriously, if you could apply the patch and show one example that demonstrates what you see to be a problem, that would be great. >> I think it is fairly obvious that REFRESH should REgenerate a FRESH >> copy of the data, versus incremental maintenance -- which attempts >> to keep the matview up-to-date without regenerating the full set of >> data. > > Having 'REFRESH' regenerate a fresh copy of the data makes sense to me, > and is what we have now, no? The only issue there is that it takes out > a big lock, which I appreciate that you're trying to get rid of. > >> Whenever there is logical replication (and materialized >> views are, conceptually, one form of that -- within the database) I >> feel it is important to be able to correct any possible "drift". >> With matviews, I see the way to do that as the REFRESH command, and >> I feel that it is important to be able to do that in a way that can >> run concurrently with readers of the matview -- without blocking >> them or being blocked by them. > > Of course. > >> Discussion of incremental maintenance really belongs on a different >> thread. > > I'm really getting tired of everyone saying "this is the only way to do > it" (or perhaps "well, this is already committed, therefore it must be > what we're gonna do") What I'm saying is that REFRESH and incremental maintenance are two different things, and conflating them just confuses everything. > when a) we're already planning to rip this out and change it, or > so I thought, The entire change to matview-specific code is to use a different operator in two places. Outside of that, it consists of adding the 12th non-default opclass to core. > and b) we're trying to make promises we can't keep with this > approach. I don't see any such. If you do, please describe them; or better yet, give an example. >> Since I have gone to the trouble to read a lot of papers >> on the topic, and select one that I think is a good basis for our >> implementation, I hope everyone will frame discussion in terms of >> either: >> - how best to implement the techniques from that paper, or >> - why some other paper presents a better technique. > > My recollection from the hackers meeting is that I'm trying to simply > paraphrase what you had said was in the paper wrt keeping track of what > rows are changed underneath and using that as a basis to implement the > changes necessary in the view. Does the paper you're referring to > describe rerunning the whole query and then trying to figure out what's > been changed..? That's really what I'm having trouble understanding > why anyone would want to implement. I'll try and find time to hunt down > the threads and papers on it, but I really could have sworn this was > gone over at the hacker meeting- and it made a lot of sense to me, then. The only thing the paper says on the topic is that any incremental maintenance scheme is a heuristic. There will always be cases when it would be faster and less resource-intensive to regenerate the data from the defining query. There is at least an implication that a good implementation will try to identify when it is in such a situation, and ignore the whole incremental maintenance approach in favor of what we are doing with REFRESH. The example they give is if there is an unqualified DELETE of every row in a table which is part of an inner join generating the result, that it would almost be faster to to generate the (empty) result set than to run their algorithm to determine that all the rows need to be deleted. One reason for having a REFRESH that re-runs the query like this is that it *is* a recommended "escape hatch" when a mass operation makes the incremental calculations too expensive. >> I really didn't expect to have to burn so much time >> and energy arguing over whether a REFRESH should leave the matview >> accurately containing the results of the matview's query. > > I appreciate you bringing me up to speed on where things actually are > here- again, sorry for not realizing the direction that this was going > in earlier; it really didn't even occur to me that it would have gone > down this road. I, also, didn't expect to spend so much time on this. > >>>> We can argue about how it should be named > > Really, I'm back to trying to figure out why we want to go down this > road at all. > >>>> and whether it should be documented >> >> I thought we had a consensus to document both the existing record >> comparison operators and these new ones, and I'm fine with that. > > If it gets added, it certainly should be documented, That seems to be the consensus. In fact, I would have submitted that with this patch if there had been any documentation for the default record comparison operators. It seemed like that might have been omitted on purpose, and it seemed weird to add documentation for a non-default operator for records when (a) we didn't document the default operator for records and (b) we don't document many of the other non-default operators already in core. > and heavily caveated. I'm not sure what caveats would be needed. It seems to me that a clear description of what it does would suffice. Like all the other non-default opclasses in core, it will be non-default because it is less frequently useful. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: