Home > mailing lists

Re: arrays and indexes - Mailing list pgsql-performance

From	Ross J. Reedstrom
Subject	Re: arrays and indexes
Date	July 26, 2004 15:57:19
Msg-id	20040726184703.GA27820@cnx.rice.edu Whole thread Raw
In response to	Re: arrays and indexes (Greg Stark <gsstark@mit.edu>)
Responses	Re: arrays and indexes Re: arrays and indexes
List	pgsql-performance

Tree view

On Mon, Jul 26, 2004 at 02:27:20AM -0400, Greg Stark wrote:
>
> "Ross J. Reedstrom" <reedstrm@rice.edu> writes:
>
> > In the new schema, the same thing is:
> >
> > SELECT * from content where 42 = ANY (authors);
> >
> > Works fine, but for the life of me I can't find nor figure out how to
> > build an index that will be used to speed this along. Any ideas?
>
> Well that's basically the problem with denormalized data like this.
>
> Have you resolved what you're going to do if two sessions try to add a user to
> the same group at the same time? Or how you'll go about removing a user from
> all his groups in one shot?

We've got plenty of interlocks in the middleware to handle the first
(mainly because this is an authoring system where everyone has to agree
to participate, and acknowledge the open license on the materials)

Second, they _can't_ be removed: we're effectively a write only archive.
Even if we weren't it would be a rare event and could go slowly (loop
over groups in the middleware, probably)

>
> Basically, if you denormalize in this fashion it becomes hard to use the
> groups as anything but single monolithic objects. Whereas normalized data can
> be queried and updated from other points of view like in the case you name
> above.

These groups _really are_ ideal for Joe Conway's work on arrays: we need
ordered vectors, so we'd be sorting all the time, otherwise. They're
static, and they're read only. The one thing they're not is fixed, known
size (Sorry Merlin). They work fine for the query as shown: the only
issue is performance.

> Postgres does have a way to do what you ask, though. It involves GiST
> indexes and the operators from the contrib/intarray directory from the
> Postgres source.

Well, yes, that's how it used to be done. I figured the new array
support should be able to handle it without the addon, however.

> However I warn you in advance that this is fairly esoteric stuff and
> will take some time to get used to. And at least in my case I found
> the indexes didn't actually help much for my data sets, probably
> because they just weren't big enough to benefit.

I know that they should help in  this case: we've got lots of content.
Any particular author or maintainter will be in a small fraction of
those. i.e.: it's ideal for an index. And the current joined case uses
an index, when it's available. I'll take a look at the GiST/contrib work,
anyway.

Thanks -

Ross
--
Ross Reedstrom, Ph.D.                                 reedstrm@rice.edu
Research Scientist                                  phone: 713-348-6166
The Connexions Project      http://cnx.rice.edu       fax: 713-348-3665
Rice University MS-375, Houston, TX 77005
GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E  F888 D3AE 810E 88F0 BEDE

pgsql-performance by date:

From: "Harmon S. Nine"
Date: 26 July 2004, 15:37:57
Subject: Re: Timestamp-based indexing

From: Pierre-Frédéric Caillaud
Date: 26 July 2004, 17:35:07
Subject: Re: arrays and indexes

Re: arrays and indexes - Mailing list pgsql-performance

Previous

Next