Possibility of Index-Only access in PostgreSQL? - Mailing list pgsql-sql
From | Gunther Schadow |
---|---|
Subject | Possibility of Index-Only access in PostgreSQL? |
Date | |
Msg-id | 3D19FFCF.8030800@aurora.regenstrief.org Whole thread Raw |
Responses |
Re: Possibility of Index-Only access in PostgreSQL?
|
List | pgsql-sql |
Hello, we have a simple query here that uses a table such as Foo(id*, bar, baz) were id is a primary key and then we have an additional index on Foo(bar, baz). Now, we have a simple query such as this: SELECT bar, baz FROM Foo WHERE bar='HIT'; The Foo table has about 25 million rows and the above query selects 35000 rows. It takes 10 minutes, that's the problem. Yes, it's doing an index scan according to EXPLAIN, and yes VACUUM ANALYZE is done, the machine is decent (dual 1 GHz i*86 processor, 2 GB RAM doing software RAID 0-1 under Linux.) The shared memory buffer is 512 MB and there is no significant swapping activity. The index and the table are on the same spindle, yes, but that's nothing I can change right now. I found that this above query runs in about 8 second if it's a rerun. Presumably our large cache then contains most of the index if not also the data rows. Apparently I am stuck in an I/O bottleneck. So, what can I do? Again I can't touch the hardware for now. Besides, we are running this same query on Oracle (on an already aging Alpha mini) and it consistently runs in 17 seconds. May be there is also caching involved, but there is something Oracle does better about it. Two things come into mind: - Oracle is content with data read from an index if that is all that's needed. Indeed, I only need bar and baz and thoseare in the index. The benefits of not having to seek the scattered rows from the data table saves 35000 back and forthhead movements / seeks. - Even if we had to go to the data rows, could it be that if a good big chunk of the index would be read ahead and bufferedin one piece and then iterated over to retrieve the data rows without making the r/w head jump back and forth toand from the index would be good. Is PostgreSQL doing this index read-ahead? Is there a tuning variable I failed to set to make it read-ahead more? I presume that PostgreSQL does good reading-ahead of the index, right? But I heard it could not do index-only reads. Now, since I have several of these problems (each time with different indexes) and our Oracle side of the project uses index-only reads a lot, I would be very eager to have index-only reads in PostgreSQL. Naively it doesn't sound like a hard problem to implement in PostgreSQL, does it? All we would need to do is get the data off the index that we already read, and not do the row lookup. Probably the optimizer has to make different plans here, so the main work will probably have to be in the optimizer code. And in addition it might happen that some lower level constraint exists in the executor who need to go to the rows for some reason. In the worst case the index does not contain all data but only just enough of a portion that distinguishes rows. That would then require to rewrite all index code to have complete data in indexes. But I'm only naively conjecturing. So, please advise, what are the real barriers to this kind of optimization? Is there anything I could do to help the progress on this issue. (I'm generally capable of hacking certain features into existing code if there isn't some subtle logic involved that I do not understand, because it isn't documented.) I appreciate your help, thank you, -Gunther -- Gunther Schadow, M.D., Ph.D. gschadow@regenstrief.org Medical Information Scientist Regenstrief Institute for Health Care Adjunct Assistant Professor Indiana University School of Medicine tel:1(317)630-7960 http://aurora.regenstrief.org