Re: tsearch patch and namespace pollution - Mailing list pgsql-hackers
From | Bruce Momjian |
---|---|
Subject | Re: tsearch patch and namespace pollution |
Date | |
Msg-id | 200708170218.l7H2Io715461@momjian.us Whole thread Raw |
In response to | tsearch patch and namespace pollution (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: tsearch patch and namespace pollution
|
List | pgsql-hackers |
Tom Lane wrote: > I find the following additions to pg_proc in the current tsearch2 patch: It seems a lot of these are useless and just bloat. I will mark a few: > proc | prorettype > ------------------------------------------+------------ > pg_ts_parser_is_visible(oid) | boolean > pg_ts_dict_is_visible(oid) | boolean > pg_ts_template_is_visible(oid) | boolean > pg_ts_config_is_visible(oid) | boolean Why would anyone look these up via OID rather than name? > tsvectorin(cstring) | tsvector > tsvectorout(tsvector) | cstring > tsvectorsend(tsvector) | bytea > tsqueryin(cstring) | tsquery > tsqueryout(tsquery) | cstring > tsquerysend(tsquery) | bytea > gtsvectorin(cstring) | gtsvector > gtsvectorout(gtsvector) | cstring > tsvector_lt(tsvector,tsvector) | boolean > tsvector_le(tsvector,tsvector) | boolean > tsvector_eq(tsvector,tsvector) | boolean > tsvector_ne(tsvector,tsvector) | boolean > tsvector_ge(tsvector,tsvector) | boolean > tsvector_gt(tsvector,tsvector) | boolean > tsvector_cmp(tsvector,tsvector) | integer > length(tsvector) | integer > strip(tsvector) | tsvector > setweight(tsvector,"char") | tsvector > tsvector_concat(tsvector,tsvector) | tsvector > vq_exec(tsvector,tsquery) | boolean > qv_exec(tsquery,tsvector) | boolean > tt_exec(text,text) | boolean > ct_exec(character varying,text) | boolean > tq_exec(text,tsquery) | boolean > cq_exec(character varying,tsquery) | boolean > tsquery_lt(tsquery,tsquery) | boolean > tsquery_le(tsquery,tsquery) | boolean > tsquery_eq(tsquery,tsquery) | boolean > tsquery_ne(tsquery,tsquery) | boolean > tsquery_ge(tsquery,tsquery) | boolean > tsquery_gt(tsquery,tsquery) | boolean > tsquery_cmp(tsquery,tsquery) | integer > tsquery_and(tsquery,tsquery) | tsquery > tsquery_or(tsquery,tsquery) | tsquery > tsquery_not(tsquery) | tsquery > tsq_mcontains(tsquery,tsquery) | boolean > tsq_mcontained(tsquery,tsquery) | boolean > numnode(tsquery) | integer > querytree(tsquery) | text > rewrite(tsquery,tsquery,tsquery) | tsquery > rewrite(tsquery,text) | tsquery > rewrite_accum(tsquery,tsquery[]) | tsquery > rewrite_finish(tsquery) | tsquery > rewrite(tsquery[]) | tsquery > stat(text) | record > stat(text,text) | record > rank(real[],tsvector,tsquery,integer) | real > rank(real[],tsvector,tsquery) | real > rank(tsvector,tsquery,integer) | real > rank(tsvector,tsquery) | real > rank_cd(real[],tsvector,tsquery,integer) | real > rank_cd(real[],tsvector,tsquery) | real > rank_cd(tsvector,tsquery,integer) | real > rank_cd(tsvector,tsquery) | real Do we realy need this many ranking functions? > token_type(oid) | record Again, why by OID? > token_type(text) | record > parse(oid,text) | record > parse(text,text) | record > lexize(oid,text) | text[] > lexize(text,text) | text[] > headline(oid,text,tsquery,text) | text > headline(oid,text,tsquery) | text > headline(text,text,tsquery,text) | text > headline(text,text,tsquery) | text > headline(text,tsquery,text) | text > headline(text,tsquery) | text > to_tsvector(oid,text) | tsvector > to_tsvector(text,text) | tsvector > to_tsquery(oid,text) | tsquery Why OID again for the configuration? I just don't see the use case and it is bloat and causes confusion. > to_tsquery(text,text) | tsquery > plainto_tsquery(oid,text) | tsquery > plainto_tsquery(text,text) | tsquery Again, OID. I asked Oleg about this and he said: > Bruce, just remove oid argument specification from documentation. so I think we can go ahead and remove cases where the configuration name or object is specified by oid. I have already removed them from the documentation and I though the patch had them removed too, but I guess not. Admittedly this API has been in flux. > to_tsvector(text) | tsvector > to_tsquery(text) | tsquery > plainto_tsquery(text) | tsquery > tsvector_update_trigger() | trigger > get_ts_config_oid(text) | oid > get_current_ts_config() | oid > (82 rows) > > (This list omits functions with INTERNAL arguments, as those are of > no particular concern to users.) Also @@ accepts TEXT @@ TEXT, at least according to the documentation. Is it clear to anyone which is tsquery and which tsvector? Is this something we want to support? > While most of these are probably OK, I'm disturbed by the prospect > that we are commandeering names as generic as "parse" or "stat" > with argument types as generic as "text". I think we need to put > a "ts_" prefix on some of these. Specifically, I find these names > totally unacceptable without a ts_ prefix: > > stat(text) | record > stat(text,text) | record > > token_type(oid) | record > token_type(text) | record > > parse(oid,text) | record > parse(text,text) | record > > lexize(oid,text) | text[] > lexize(text,text) | text[] > > These guys might be all right given that some of their arguments are > tsvector or tsquery, but it's not completely convincing --- think about > the case where an argument is given as an undecorated literal string. > It's also not all that clear that they are related to text searching. > I'm for putting a ts_ prefix on them too: > > rank(real[],tsvector,tsquery,integer) | real > rank(real[],tsvector,tsquery) | real > rank(tsvector,tsquery,integer) | real > rank(tsvector,tsquery) | real > rank_cd(real[],tsvector,tsquery,integer) | real > rank_cd(real[],tsvector,tsquery) | real > rank_cd(tsvector,tsquery,integer) | real > rank_cd(tsvector,tsquery) | real > > rewrite(tsquery,tsquery,tsquery) | tsquery > rewrite(tsquery,text) | tsquery > rewrite_accum(tsquery,tsquery[]) | tsquery > rewrite_finish(tsquery) | tsquery > rewrite(tsquery[]) | tsquery > > headline(oid,text,tsquery,text) | text > headline(oid,text,tsquery) | text > headline(text,text,tsquery,text) | text > headline(text,text,tsquery) | text > headline(text,tsquery,text) | text > headline(text,tsquery) | text > > These guys are just plain badly named, as it's completely unobvious that > they have anything to do with tsearch (or what they do at all, actually). > Furthermore the "varchar" variants seem entirely redundant with the > "text" ones: > > vq_exec(tsvector,tsquery) | boolean > qv_exec(tsquery,tsvector) | boolean > tt_exec(text,text) | boolean > ct_exec(character varying,text) | boolean > tq_exec(text,tsquery) | boolean > cq_exec(character varying,tsquery) | boolean > > Comments, suggestions? I would be happy if all text search functions began with 'ts', 'ts_', or 'to_ts', and if we don't clean this up now, it is going to be harder in the future. I think users can expect some migration for text search in 8.3 as a benefit of getting into core and be dump-able. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
pgsql-hackers by date: