Re: GSoC - proposal - Materialized Views in PostgreSQL - Mailing list pgsql-hackers
From | Pavel |
---|---|
Subject | Re: GSoC - proposal - Materialized Views in PostgreSQL |
Date | |
Msg-id | 4BCDEB98.8010608@seznam.cz Whole thread Raw |
In response to | Re: GSoC - proposal - Materialized Views in PostgreSQL (Greg Smith <greg@2ndquadrant.com>) |
Responses |
Re: GSoC - proposal - Materialized Views in PostgreSQL
|
List | pgsql-hackers |
Greg Smith wrote: > pavelbaros wrote: >> I am also waiting for approval for my repository named >> "materialized_view" on git.postgresql.org, so I could publish >> completed parts. > > Presuming that you're going to wander there and get assigned what > looks like an official repo name for this project is a > bit...optimistic. I would recommend that you publish to something > like github instead (you can fork http://github.com/postgres/postgres > ), and if the work looks good enough that it gets picked up by the > community maybe you migrate it onto the main site eventually. > git.postgresql.org is really not setup to be general hosting space for > everyone who has a PostgreSQL related project; almost every repo on > there belongs to someone who has already been a steady project > contributor for a number of years. Yes, you're true, I'm kind of newbe in this kind of project and specially in PostgreSQL. But I think it is best way to get into PostgreSQL. When I chose my bachelor thesis I did not know I could participate GSoC or try to make it commitable. Anyway I will make repo on github, so everybody could look at it, as soon as posible. <http://github.com/pbaros/postgres> > > (Switching to boilerplate mode for a paragraph...) You have picked a > PostgreSQL feature that is dramatically more difficult than it appears > to be, and I wouldn't expect you'll actually finish even a fraction of > your goals in a summer of work. You're at least in plentiful > company--most students do the same. As a rule, if you see a feature > on our TODO list that looks really useful and fun to work on, it's > only still there because people have tried multiple times to build it > completely but not managed to do so because it's harder than it > appears. This is certainly the case with materialized views. > > You've outlined a reasonable way to build a prototype that does a > limited implementation here. The issue is what it will take to extend > that into being production quality for the real-world uses of > materialized views. How useful your prototype is depends on how well > it implements a subset of that in a way that will get used by the > final design. > > The main hidden complexity in this particular project relates to > handling view refreshes. The non-obvious problem is that when the > view updates, you need something like a SQL MERGE to really handle > that in a robust way that doesn't conflict with concurrent access to > queries against the materialized view. And work on MERGE support is > itself blocked behind the fact that PostgreSQL doesn't have a good way > to lock access to a key value that doesn't exist yet--what other > databases call key range locking. See the notes for "Add SQL-standard > MERGE/REPLACE/UPSERT command" at http://wiki.postgresql.org/wiki/Todo > for more information. > > You can work around that to build a prototype by grabbing a full table > lock on the materialized view when updating it, but that's not a > production quality solution. Solving that little detail is actually > more work than the entire project you've outlined. Your suggested > implementation--"In function CloseIntoRel executor swap relfilenode's > of temp table and original table and finally delete temp table"--is > where the full table lock is going to end up at. The exact use cases > that need materialized views cannot handle a CLUSTER-style table > recreation each time that needs an exclusive lock to switchover, so > that whole part of your design is going to be a prototype that doesn't > work at all like what needs to get built to make this feature > committable. It's also not a reasonable assumption that you have > enough disk space to hold a second copy of the MV in a production system. For now I know it is not commitable in actual state, but for my thesis it is enough and I know it will not be commitable with this design at all. In case of GSoC it will depends on the time I will be able to spend on it, if I will consider some other design. > > Once there's a good way to merge updates, how to efficiently generate > them against the sort of large data sets that need materalized > views--so you just write out the updates rather than a whole new > copy--is itself a large project with a significant quantity of > academic research to absorb before starting. Dan Colish at Portland > State has been playing around with prototypes for the specific problem > of finding a good algorithm for view refreshing that is compatible > with PostgreSQL's execution model. He's already recognized the table > lock issue here and for the moment is ignoring that part. I don't > have a good feel yet for how long the targeted update code will take > to mature, but based on what I do know I suspect that little detail is > also a larger effort than the entire scope you're envisioning. > There's a reason why the MIT Press compendium "Materialized Views: > Techniques, Implementations, and Applications" is over 600 pages > long--I hope you've already started digging through that material. I would like to start to dig through that, but I'm in a hurry now. I already have made a small research on MV as part of my thesis. I also plan to continue study PostgreSQL and Materialized Views more into the depth, preferably as my master thesis. But I realize MV feature commitable to PostgreSQL is not project for one person, of course. > > Now, with all that said, that doesn't mean there's not a useful > project for you buried in this mess. The first two steps in your plan: > > 1) create materialized view > 2) change rewriter > > Include building a prototype grammer, doing an initial executor > implementation, and getting some sort of rewriter working. That is > potentially good groundwork to lay here. I would suggest that you > completely drop your step 3: > > 3) create command that takes snapshot (refresh MV) > > Because you cannot built that in a way that will be useful (and by > that I mean committable quality) until there's a better way to handle > updates than writing a whole new table and grabbing a full relation > lock to switch to it. To do a good job just on the first two steps > should take at least a whole summer anyway--there's a whole stack of > background research needed I haven't seen anyone do yet, and that > isn't on your plan yet. There is a precedent for taking this > approach. After getting stalled trying to add the entirety of easy > partitioning to PostgreSQL, the current scope has been scaled back to > just trying to get the syntax and on-disk structure right, then finish > off the implementation. See > http://wiki.postgresql.org/wiki/Table_partitioning to get an idea how > that's been broken into those two major chunks. > Anyway thanks for all of your advices and help. best regards, Pavel Baros
pgsql-hackers by date: