Re: Google SoC--Idea Request - Mailing list pgsql-hackers
From | Nikolay Samokhvalov |
---|---|
Subject | Re: Google SoC--Idea Request |
Date | |
Msg-id | e431ff4c0605020134x303014a7r5cea1ed1951f09f6@mail.gmail.com Whole thread Raw |
In response to | Google SoC--Idea Request ("Jonah H. Harris" <jonah.harris@gmail.com>) |
Responses |
Re: Google SoC--Idea Request
|
List | pgsql-hackers |
Proposal: XMLType for PostgreSQL. *** Minimum: *** to have special type support for storing XML data and working with it. This means following:- ability to define any column of a table as of XMLType; internally, all data is stored as VARCHAR;- auto validation of documents against XML schema, if it was specified in column definition or in XML data sheets themselves (DTD, XSD or at least one of them) /*contrib/xml2 has such feature, but it uses libxml, what means DOM interface. Maybe it's better to use some SAX parser to solve this task*/;- XPath indexes for queries with path expressions in WHERE clause /*I suppose this kind of indexes would be most frequently used. I propose using good labeling schema and GIST and/or Gin here*/;- some subset of SQL/XML. Actually, part 14 of SQL:200n (SQL/XML) has more than 400 pages now and contains some established constructions, that are using in other DBMSes. There is the some patch already written by Pavel Stehule: http://www.pgsql.ru/db/mw/msg.html?mid=2096818. (BTW, what is with it? it was kept for 8.2, so what is the result?) I've tested it several months ago, basic SQL/XML functions worked fine. It changes grammar, but there is no other way... So, using this patch as a part of this project means that this project cannot be contrib module, unfortunately. Nevertheless, current paper of SQL/XML standard seems to be mature - so, compared with existing implementation it would be a nice 'landmark';- XML domains support: ability to define domain based on XMLType and XML schema definition (e.g., external DTD file or smth). I'd consider XML schema definition as a restriction of entire XML Type (similar to restrictions for plain types, which are defined as CHECK constraint in domain definition) *** Maximum: ***- all things from 'minimum' list :-)- reach index system: * structure index (labeling schema; prefix schemasseem to be best for this and I suppose GIST would help here). Actually, it would be full shredding, like primary index for XML in MS SQL Server, but I'm aware of better labeling algorithms than simple prefix labeling (as in SQL Server). Surely, GIST/Gin support would be great foundation for these * flexible support of path indexes, value indexes and so on(smth like secondary XML indexes in SQL Server...) - as a continuation of work on path indexes from 'minimum' list;- full-text search abilties (tsearch2 / GIST);- different encoding issues (autoconversion to column's encoding, etc);- ability to choose storage type: VARCHAR or 'native' (trees - like in native XML DBMSes and DB2 Viper [if their articles don't lie ;-)]) mode. Actually, this is very-very huge task (almost so as creating DBMS from scratch) and I inderstand clearly that I won't solve it using only my own abilities. But the work on 'minimum' list (especially if it will be a part of SoC) would be a good start point and may involve some other developers that help to implement it. Maybe at the initial stage, it's worth to integrate with some other DBMS and work with it using two-phase commit (surely, this is not a clue to all problems, as it means two different execution plans, etc);- XQuery and its integration with SQL (according SQL/XML standard). In other words, implementation of XQuery Data Model - this would be great target point (version 1.0 of entire project);- XML views / updatable XML views (actually, it's a crazy idea, but it's my dream ;-) ) As a part of SoC I would concentrate on tasks from 'minimum' list. It would be a good start point. Some articles: Fresh draft of SQL:200n: http://www.wiscorp.com/sql_2003_standard.zip Other SQL/XML papers: http://www.wiscorp.com/SQLStandards.html#xsqlstandards XISS system (Li, Moon - advanced interval indexes): http://www.cs.arizona.edu/xiss/ MASS (prefix indexes): http://davis.wpi.edu/dsrg/vamana/WebPages/Publication.html Staircase joins (accelerating XPath Evaluation): http://www.inf.uni-konstanz.de/dbis/publications/download/injection.pdf Oleg's TODO list: http://www.sai.msu.su/~megera/oddmuse/index.cgi/todo XML in DB2 Viper: http://www.vldb2005.org/program/paper/thu/p1164-nicola.pdf XQuery in SQL Server: http://www.vldb2005.org/program/paper/thu/p1175-pal.pdf Labeling schema in SQL Server (ORDPATHs): http://portal.acm.org/ft_gateway.cfm?id=1007686&type=pdf&coll=GUIDE&dl=GUIDE&CFID=74920272&CFTOKEN=73736781 One more comment: I'm a PhD student of MIPT, Russia. I plan to create an overview of XMLType implementations of last versions of three major commercial DBMSes (ORA, MS, DB2), comparing them to standard and each other. First article of this comparison is planned to the end of May. This work will help to understand, where major commercial DBMS vendors go and why they go there :-) Moreover, I intend to create a technique for testing of XMLType support in (O)RDBMSes. In spite of the fact, that SoC assumes all work be done by only one person, I expect some upport/help from following people:- Dr. Sergey Kuznetsov (my scientific mentor)- Oleg Bartunov and Teodor Sigaev (as majordevelopers of PostgreSQL and GIST and Gin, they definitely can help me to be successive);- Ivan Zolotukhin (together we plan to create the overviewmentioned above)- PostgreSQL community (actually, as I've already mentioned, I intend using code by Pavel Stehule, and I'm pretty sure that I'll need a lot of other help from the community) On 4/15/06, Jonah H. Harris <jonah.harris@gmail.com> wrote: > Hey everyone, > > I know we started a discussion a month or so ago regarding ideas for > SoC projects. However, after reading through the thread, I didn't see > us nail down any actual items. > > As such, we need to quickly put together a list of oh, 15-20 midlevel > project ideas. I'm sure we can pull some off the TODO list, but we > should also look at project ideas for porting some of the most used > third-party OSS software to PostgreSQL too (portals, CMS systems, > accounting systems, etc.). > > All ideas welcome! > > -- > Jonah H. Harris, Database Internals Architect > EnterpriseDB Corporation > 732.331.1324 > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster > -- Best regards, Nikolay
pgsql-hackers by date: