Can I use PostgreSQL to develop a self-organizing database system? - Mailing list pgsql-hackers
From | Rada Chirkova |
---|---|
Subject | Can I use PostgreSQL to develop a self-organizing database system? |
Date | |
Msg-id | CMEOKOMDENAFPGPJBBJDKEEHCKAA.chirkova@csc.ncsu.edu Whole thread Raw |
Responses |
Re: Can I use PostgreSQL to develop a self-organizing
|
List | pgsql-hackers |
Hi, I have asked my question on pgsql-general, and Tom Lane suggested I post here too. I would really appreciate your opinion. At NC State University, my students and I are working on a project called "self-organizing databases," please see description below. I would like to use an open-source database system for implementation and would really appreciate your opinion on whether PostgreSQL is suitable for the project. In general, I am very impressed by the quality of PostgreSQL code and documentation, as well as by the support of the developer community. For the project, I need a cost-based query optimizer with exhaustive join enumeration and use of statistics on stored relations; PostgreSQL has that. I also need the ability to process SQL queries with aggregation, extensive indexing capabilities, view mechanisms, and possibly integrity constraints; it seems that PostgreSQL has all that. We will modify the query optimizer to incorporate rewriting queries using views, and we will create view-generating and view-manipulating modules. Please let me know if you have comments. Sincerely, Rada Chirkova ====================== Self-Organizing Databases The goal of this project is to develop new effective methods to improve the performance of sets of frequent and important queries on large relational databases at all times, which could improve the efficiency of user interactions with data-management systems. Solving the problem will have the most effect in query optimization, data warehousing, and information integration, which are important research topics with direct practical applications. The project focuses on the methodology of evaluating queries using views; views are relations that are defined by auxiliary queries and can be used to rewrite and answer user queries. One way to improve query performance is precompute and store (i.e., materialize) views. To truly optimize query performance, it is critical to materialize the "right" views. The current focus of the project is on demonstrating that, by designing and materializing views, it is possible to ensure optimal or near-optimal performance of frequent and important queries, for common and important query types. We consider this problem in the broader context of designing self-organizing databases: A self-organizing database periodically determines, without human intervention, a representative set of frequent and important queries on the data, and incrementally designs and precomputes the optimal (or near-optimal) views for that representative query workload. As the representative query workload and the stored data change over time, self-organizing databases adapt to the changes by changing the set of materialized views that are used to improve the query-answering performance in the database. For building self-organizing databases, we consider an end-to-end solution that is, we consider all aspects of handling and using views, including: · designing and materializing views and indexes to improve query performance; · exploring the effects of materialized views on the process of query optimization; · adapting view design to the changing query workload, including the process of retiring views that are no longer useful; · developing methods for auomatically updating existing materialized views over time, to reflect the changes in the stored data; · developing methods to collect database statistics to reliably estimate the sizes of the views the system considers for materialization; · analyzing the use of system resources and allocating an appropriate amount of resources to view management in the system.
pgsql-hackers by date: