Home > mailing lists

Re: [GENERAL] arrays - Mailing list pgsql-sql

From	Josh Berkus
Subject	Re: [GENERAL] arrays
Date	September 30, 2002 11:54:06
Msg-id	web-1734564@davinci.ethosmedia.com Whole thread Raw
In response to	Re: [GENERAL] arrays (Mike Sosteric <mikes@athabascau.ca>)
Responses	Re: [GENERAL] arrays Re: [GENERAL] arrays Re: [GENERAL] arrays
List	pgsql-sql

Tree view

Mike,

> We are currently developing a database to host some complicated, XMl
> layered data. We have chosen postgres because of its ability to store
> multidimensional arrays. We feel that using these will allow us to
> simplify the database structure considerably by storing some data in
> multidimensional arrays. 

Hmmm ... I'm curious; what kind of data do you feel could be
*simplified* by multi-dimensional arrays?   

> However, we currently have some dissenters who believe that using the
> multidimensional arrays will make queries slower and unneccesarily
> complicated. 

They're correct, especially about the latter.

> 1) are SQL queries slower when extracting data from multidimensional
> arrays

Yes, but this is fixable; see the Intarray package in /contrib.

> 2) are table joins more difficult or unneccesarily complicated

Yes.

> 3) can you do selects on only a portion of a multidimensional array.

Yes.

> That
> is, if you were storing multilanguage titles in a two dimensional
> array, 
> 
> [en], "english title"
> [fr], "french title"
> 
> could you select where title[0] = 'en'

Yes.

> I know these may sound like terribily stupid questions. but we need
> some
> quick guidance before proceeding with a schema that relies on these
> advanced data features of postgres

The problem you will be facing is that Arrays are one of the
fundamentally *Non-Relational* features that Postgresql supports for a
limited set of specialized purposes (mostly buffer tables, procedures,
and porting from MySQL).   As such, incorporating arrays into any kind
of complex schema will drive you to drink ... and is 95% likely more
easily done through tables and sub-tables, in any case.  

Let's take your example of "title", and say we wanted to use it in a
join:

SELECT movie.name, movie.show_date, movie.title_lang, title.translation
FROM movies JOIN title_langs ON (movie.title_lang[1] = title_langs.lang OR movie.title_lang[2] =
title_langs.lang OR movie.title_lang[3] = title_langs.lang ... )

... as you can see, the join is extremely painful.   Let alone
constructing a query like "Select all movies with titles only in
English and French and one other language."  (try it, really)

Then there's the not insignificant annoyance of getting data into and
out of multi-dimensional arrays,  which must constantly be parsed into
text strings.  And the fact that you will have to keep track, in your
middleware code, of what the ordinal numbers of arrays mean, since
array elements are fundamentally ordered.   (BTW, Postgres arrays begin
at 1, not 0)

Now, I know at least one person who is using arrays to store scientific
data.  However, that data arrives in his lab in the form of matrices,
and is not used for joins or query criteria beyond a simple "where"
clause.

As such, I'd reccommend one of two approaches for you:

1) Post some of your schema ideas here, and let us show you how they
are better done relationally.   The relational data model has 30 years
of thought behind it -- it can solve a lot of problems.

2) Shift over to an XML database or a full-blown OODB (like Cache').

Good luck.

-Josh Berkus

pgsql-sql by date:

From: Tom Lane
Date: 30 September 2002, 10:42:21
Subject: Re: [GENERAL] arrays

From: "Dan Langille"
Date: 30 September 2002, 12:00:41
Subject: Re: [GENERAL] arrays

Re: [GENERAL] arrays - Mailing list pgsql-sql

Previous

Next