Re: New form of index "persistent reference" - Mailing list pgsql-hackers
From | pgsql@mohawksoft.com |
---|---|
Subject | Re: New form of index "persistent reference" |
Date | |
Msg-id | 16834.24.91.171.78.1108059139.squirrel@mail.mohawksoft.com Whole thread Raw |
In response to | Re: New form of index "persistent reference" ("Bort, Paul" <pbort@tmwsystems.com>) |
List | pgsql-hackers |
> If that ID is the only thing you use to access that data, why not just > store > it in a flat file with fixed-length records? seek() (or your language's > equivalent) is usually fast. As a matter of policy, I would never manage data outside of the database. > > If you need to drive that from within PostgreSQL, you would need an > untrusted language to read the file, but you could also generate it from a > table using a trigger. Very ugly. > > Or maybe use a serial column, an index on that column, and cluster the > table > on that index. It's more than one lookup, but not much with a Btree index. > (Not sure if this is better than just using a serial and an index. > http://www.postgresql.org/docs/8.0/interactive/sql-cluster.html says it > isn't, if I read it correctly.) Clustering is OK, but it doesn't handle updates and additions until you recluster the data. If a static reference is all that is needed, then merely using CTID would suffice. I was thinking a little overhead for a reference table would allow it to hook into PostgreSQL and keep it up to date. > > Then anytime there is a batch of updates to the table, re-cluster it. Yea, like I said, there are easier ways of doing that with fairly static data. > >> -----Original Message----- >> From: pgsql@mohawksoft.com [mailto:pgsql@mohawksoft.com] >> Sent: Thursday, February 10, 2005 11:22 AM >> To: pgsql-hackers@postgresql.org >> Subject: [HACKERS] New form of index "persistent reference" >> >> >> For about 5 years now, I have been using a text search engine >> that I wrote >> and maintain. >> >> In the beginning, I hacked up function mechanisms to return >> multiple value >> sets and columns. Then PostgreSQL aded "setof" and it is was >> cool. Then it >> was able to return a set of rows, which was even better. >> >> Lately, I have been thinking that a cool form of index would >> be some sort >> of "persistent reference" index. Like the old ISAM days of >> yore, a fixed >> number could point you right to the row that you want. I'm >> not sure if the >> "persistent reference" is a specific auto numbering column type or >> separate index structure or both. >> >> I asked the question how do you get a record without going through an >> index, the answer was CTID, which unfortunately changes when >> the row is >> updated. >> >> Now, what I want to brainstorm is some sort of "persistent reference" >> where the value is not algorithmically stored, maybe just an >> offset into a >> table. The number of operations should be about 1 per lookup. >> >> Imagine a dynamically growing array that has one slot per >> row. Every row >> is considered unique. Rows which are updated, their CTID is >> updated in the >> reference. (with vacuum?) >> >> Imagine something like this: >> >> create table foobar(id reference, name varchar, value varchar); >> >> select * from foobar where id = 100; >> >> The reference type has an implicit index that is basically a >> lookup table. >> On unique references where the reference value is fairly >> arbitrary, this >> would be a HUGE gain for direct lookups. There is no need for >> the NlogN of >> a tree. >> >> On the surface level, this would be a huge win for websites that use >> semi-fixed tables of data. >> >> >> >> ---------------------------(end of >> broadcast)--------------------------- >> TIP 1: subscribe and unsubscribe commands go to >> majordomo@postgresql.org >> >
pgsql-hackers by date: