Re: arrays and indexes - Mailing list pgsql-performance
From | Ross J. Reedstrom |
---|---|
Subject | Re: arrays and indexes |
Date | |
Msg-id | 20040726184703.GA27820@cnx.rice.edu Whole thread Raw |
In response to | Re: arrays and indexes (Greg Stark <gsstark@mit.edu>) |
Responses |
Re: arrays and indexes
Re: arrays and indexes |
List | pgsql-performance |
On Mon, Jul 26, 2004 at 02:27:20AM -0400, Greg Stark wrote: > > "Ross J. Reedstrom" <reedstrm@rice.edu> writes: > > > In the new schema, the same thing is: > > > > SELECT * from content where 42 = ANY (authors); > > > > Works fine, but for the life of me I can't find nor figure out how to > > build an index that will be used to speed this along. Any ideas? > > Well that's basically the problem with denormalized data like this. > > Have you resolved what you're going to do if two sessions try to add a user to > the same group at the same time? Or how you'll go about removing a user from > all his groups in one shot? We've got plenty of interlocks in the middleware to handle the first (mainly because this is an authoring system where everyone has to agree to participate, and acknowledge the open license on the materials) Second, they _can't_ be removed: we're effectively a write only archive. Even if we weren't it would be a rare event and could go slowly (loop over groups in the middleware, probably) > > Basically, if you denormalize in this fashion it becomes hard to use the > groups as anything but single monolithic objects. Whereas normalized data can > be queried and updated from other points of view like in the case you name > above. These groups _really are_ ideal for Joe Conway's work on arrays: we need ordered vectors, so we'd be sorting all the time, otherwise. They're static, and they're read only. The one thing they're not is fixed, known size (Sorry Merlin). They work fine for the query as shown: the only issue is performance. > Postgres does have a way to do what you ask, though. It involves GiST > indexes and the operators from the contrib/intarray directory from the > Postgres source. Well, yes, that's how it used to be done. I figured the new array support should be able to handle it without the addon, however. > However I warn you in advance that this is fairly esoteric stuff and > will take some time to get used to. And at least in my case I found > the indexes didn't actually help much for my data sets, probably > because they just weren't big enough to benefit. I know that they should help in this case: we've got lots of content. Any particular author or maintainter will be in a small fraction of those. i.e.: it's ideal for an index. And the current joined case uses an index, when it's available. I'll take a look at the GiST/contrib work, anyway. Thanks - Ross -- Ross Reedstrom, Ph.D. reedstrm@rice.edu Research Scientist phone: 713-348-6166 The Connexions Project http://cnx.rice.edu fax: 713-348-3665 Rice University MS-375, Houston, TX 77005 GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E F888 D3AE 810E 88F0 BEDE
pgsql-performance by date: