No heap lookups on index - Mailing list pgsql-hackers
From | David Scott |
---|---|
Subject | No heap lookups on index |
Date | |
Msg-id | 43CEA194.8020105@apptechsys.com Whole thread Raw |
Responses |
Re: No heap lookups on index
Re: No heap lookups on index Re: No heap lookups on index Re: No heap lookups on index |
List | pgsql-hackers |
Allow me a brief introduction. I work in a company who contracts intelligence analysis software to the government. We are currently developing a product which is using PostgreSQL at it's core. Due to the licensing of the product and the integration with perl this is our first choice in database solutions. We are, however, currently stuck. We are storing millions of rows and require very high query performance. We have spent the last several months tweaking, list lurking and researching all the various tweaks and performance enhancements and have come to the conclusion that our biggest slowdown is validating the index rows which match our selection criteria against the heap values. In general cases there is a very small amount required for this, but in our extreme use cases we are finding this to slow our queries by an unacceptable amount of time. We would like to resolve this issue. In that endeavor we have done some feasibility analysis (either to write a patch ourselves or attempt to commission an expert to do so), starting with the archives for this list. We found several posts discussing the issue and it seems that the complexity of storing the tuple visibility information inside of the index rows is prohibitive for simple indexes. I have used SQL Server in the past and have noticed that bookmark lookups are avoided because they force the query executor to actually fetch the data page off of disk, rather then return the values that exist in the index. I have verified times against the PostgreSQL installation and SQL Server to verify that the SQL Server queries come back at roughly the same speed when avoiding bookmark lookups as Postgres queries accessing clustered tables using the index the table is clustered on. Since I am sure everyone is tired of the intro by now, I'll get to the questions: Do commercial databases implement MVCC in a way that allows an efficient implementation of index lookups that can avoid heap lookups? Is there any way to modify PostgreSQL to allowindex lookups without heap validation that doesn't involve re-writing the MVCC implementation of keeping dead rows on the live table? Is the additional overhead of keeping full tuple visibility information inside of the index so odious to the Postgres community as to prevent a patch with this solution from being applied back to the head? Maybe as an optional use feature? We would prefer this solution for our needs over the bitmap of heap pages listed in the TODO list because we want to ensure optimal query times, regardless of the state of the cache and because we are concerned with performance in the face of concurrent updates on the page level. Thanks for any thoughts on this, I know this is a perennial topic, but we are seriously considering contributing either code or money to the solution of this problem. David Scott Applied Technical Systems, Inc.
pgsql-hackers by date: