Re: [GENERAL] arrays - Mailing list pgsql-sql
From | Josh Berkus |
---|---|
Subject | Re: [GENERAL] arrays |
Date | |
Msg-id | web-1734564@davinci.ethosmedia.com Whole thread Raw |
In response to | Re: [GENERAL] arrays (Mike Sosteric <mikes@athabascau.ca>) |
Responses |
Re: [GENERAL] arrays
Re: [GENERAL] arrays Re: [GENERAL] arrays |
List | pgsql-sql |
Mike, > We are currently developing a database to host some complicated, XMl > layered data. We have chosen postgres because of its ability to store > multidimensional arrays. We feel that using these will allow us to > simplify the database structure considerably by storing some data in > multidimensional arrays. Hmmm ... I'm curious; what kind of data do you feel could be *simplified* by multi-dimensional arrays? > However, we currently have some dissenters who believe that using the > multidimensional arrays will make queries slower and unneccesarily > complicated. They're correct, especially about the latter. > 1) are SQL queries slower when extracting data from multidimensional > arrays Yes, but this is fixable; see the Intarray package in /contrib. > 2) are table joins more difficult or unneccesarily complicated Yes. > 3) can you do selects on only a portion of a multidimensional array. Yes. > That > is, if you were storing multilanguage titles in a two dimensional > array, > > [en], "english title" > [fr], "french title" > > could you select where title[0] = 'en' Yes. > I know these may sound like terribily stupid questions. but we need > some > quick guidance before proceeding with a schema that relies on these > advanced data features of postgres The problem you will be facing is that Arrays are one of the fundamentally *Non-Relational* features that Postgresql supports for a limited set of specialized purposes (mostly buffer tables, procedures, and porting from MySQL). As such, incorporating arrays into any kind of complex schema will drive you to drink ... and is 95% likely more easily done through tables and sub-tables, in any case. Let's take your example of "title", and say we wanted to use it in a join: SELECT movie.name, movie.show_date, movie.title_lang, title.translation FROM movies JOIN title_langs ON (movie.title_lang[1] = title_langs.lang OR movie.title_lang[2] = title_langs.lang OR movie.title_lang[3] = title_langs.lang ... ) ... as you can see, the join is extremely painful. Let alone constructing a query like "Select all movies with titles only in English and French and one other language." (try it, really) Then there's the not insignificant annoyance of getting data into and out of multi-dimensional arrays, which must constantly be parsed into text strings. And the fact that you will have to keep track, in your middleware code, of what the ordinal numbers of arrays mean, since array elements are fundamentally ordered. (BTW, Postgres arrays begin at 1, not 0) Now, I know at least one person who is using arrays to store scientific data. However, that data arrives in his lab in the form of matrices, and is not used for joins or query criteria beyond a simple "where" clause. As such, I'd reccommend one of two approaches for you: 1) Post some of your schema ideas here, and let us show you how they are better done relationally. The relational data model has 30 years of thought behind it -- it can solve a lot of problems. 2) Shift over to an XML database or a full-blown OODB (like Cache'). Good luck. -Josh Berkus