Thread: select from pipe-delimited field
Hello all,
I am trying to do a SELECT operation with a WHERE condition. However, the column with which I am trying to do the comparison has several values which are pipe-delimited. I want to return a match(true) if the WHERE condition matches any of the bar-delimited values.
For example, SELECT id WHERE synonyms = 'word';
Here synonyms is a pipe-delimited field and I want to match "word" with any of the values of the synonyms fields.
Thanks in advance.
Neel
I am trying to do a SELECT operation with a WHERE condition. However, the column with which I am trying to do the comparison has several values which are pipe-delimited. I want to return a match(true) if the WHERE condition matches any of the bar-delimited values.
For example, SELECT id WHERE synonyms = 'word';
Here synonyms is a pipe-delimited field and I want to match "word" with any of the values of the synonyms fields.
Thanks in advance.
Neel
On Mon, Aug 23, 2010 at 05:44:09PM -0500, san man wrote: > Hello all, > > I am trying to do a SELECT operation with a WHERE condition. However, the > column with which I am trying to do the comparison has several values which > are pipe-delimited. I want to return a match(true) if the WHERE condition > matches any of the bar-delimited values. You'll want to normalize this table into two or more tables, at some point. > For example, SELECT id WHERE synonyms = 'word'; > > Here synonyms is a pipe-delimited field and I want to match "word" with any > of the values of the synonyms fields. Try the LIKE function. http://www.postgresql.org/docs/current/static/functions-matching.html#FUNCTIONS-LIKE Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Tue, Aug 24, 2010 at 12:44 AM, san man <neelakash21@gmail.com> wrote:
Hello all,
I am trying to do a SELECT operation with a WHERE condition. However, the column with which I am trying to do the comparison has several values which are pipe-delimited. I want to return a match(true) if the WHERE condition matches any of the bar-delimited values.
For example, SELECT id WHERE synonyms = 'word';
Here synonyms is a pipe-delimited field and I want to match "word" with any of the values of the synonyms fields.
SELECT id FROM t WHERE 'word' = ANY(regexp_split_to_array(synonyms, '[|]'));
With postgres it's possible to make an index on regexp_split_to_array(synonyms, '[|]')) for some extra speed. But its better to store the synonyms as an array in the first place.
Thanks in advance.
Neel
Thanks for the replies.
David, I would have normalized it to 2 or more tables, but the number of bar-delimited are not fixed and as new data are added the maximum number of these values may change. Also, the problem with like I think is that matching is not strict and thus might give spurious hits.
Arjen's solution(haven't tried yet) looks better to me.
-Neel
David, I would have normalized it to 2 or more tables, but the number of bar-delimited are not fixed and as new data are added the maximum number of these values may change. Also, the problem with like I think is that matching is not strict and thus might give spurious hits.
Arjen's solution(haven't tried yet) looks better to me.
-Neel
On Mon, Aug 23, 2010 at 5:59 PM, Arjen Nienhuis <a.g.nienhuis@gmail.com> wrote:
On Tue, Aug 24, 2010 at 12:44 AM, san man <neelakash21@gmail.com> wrote:Hello all,
I am trying to do a SELECT operation with a WHERE condition. However, the column with which I am trying to do the comparison has several values which are pipe-delimited. I want to return a match(true) if the WHERE condition matches any of the bar-delimited values.
For example, SELECT id WHERE synonyms = 'word';
Here synonyms is a pipe-delimited field and I want to match "word" with any of the values of the synonyms fields.SELECT id FROM t WHERE 'word' = ANY(regexp_split_to_array(synonyms, '[|]'));With postgres it's possible to make an index on regexp_split_to_array(synonyms, '[|]')) for some extra speed. But its better to store the synonyms as an array in the first place.Thanks in advance.
Neel
On 08/24/2010 07:07 AM, san man wrote: > Thanks for the replies. > David, I would have normalized it to 2 or more tables, but the number of > bar-delimited are not fixed and as new data are added the maximum number > of these values may change. Also, the problem with like I think is that > matching is not strict and thus might give spurious hits. You are trying to simulate arrays using your own custom setup. If you switch from using pipe-delimeted text to an array, you can use the PostgreSQL array operators to do what you want. You even have (limited) indexing options. As for normalizing the data out to another table: Sometimes performance concerns render that undesirable. Arrays can be very useful for fairly small amounts of data that's tightly associated with a given record, especially things like search keys. There's a reason that tsvector is implemented how it is, rather than as a breakout table full of keyword associations. It has to be fast, and indexable. It sounds like the OP's problem has the same requirements. -- Craig Ringer
On Mon, Aug 23, 2010 at 8:21 PM, Craig Ringer <craig@postnewspapers.com.au> wrote: > On 08/24/2010 07:07 AM, san man wrote: >> >> Thanks for the replies. >> David, I would have normalized it to 2 or more tables, but the number of >> bar-delimited are not fixed and as new data are added the maximum number >> of these values may change. Also, the problem with like I think is that >> matching is not strict and thus might give spurious hits. > > You are trying to simulate arrays using your own custom setup. > > If you switch from using pipe-delimeted text to an array, you can use the > PostgreSQL array operators to do what you want. You even have (limited) > indexing options. > > As for normalizing the data out to another table: Sometimes performance > concerns render that undesirable. Arrays can be very useful for fairly small > amounts of data that's tightly associated with a given record, especially > things like search keys. > > There's a reason that tsvector is implemented how it is, rather than as a > breakout table full of keyword associations. It has to be fast, and > indexable. It sounds like the OP's problem has the same requirements. I think if I was going to normalize it I would normalize into a many to many lookup table with the lookup table being in the middlle for an it from each table. It's no full text search, but it would probably be faster than one big second table with lots of redundant info in it.
On Mon, Aug 23, 2010 at 06:07:08PM -0500, san man wrote: > Thanks for the replies. > David, I would have normalized it to 2 or more tables, but the number of > bar-delimited are not fixed and as new data are added the maximum number of > these values may change. Also, the problem with like I think is that > matching is not strict and thus might give spurious hits. The fact that it's not fixed is irrelevant. just add table: create table synonyms ( word text, synonym text, primary key (synonymi, word) ); and you're fine. you can have as many synonyms per word as you want. depesz