Re: tsearch_core patch: permissions and security issues - Mailing list pgsql-hackers
From | Oleg Bartunov |
---|---|
Subject | Re: tsearch_core patch: permissions and security issues |
Date | |
Msg-id | Pine.LNX.4.64.0706150823290.1881@sn.sai.msu.ru Whole thread Raw |
In response to | Re: tsearch_core patch: permissions and security issues (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: tsearch_core patch: permissions and security issues
|
List | pgsql-hackers |
On Thu, 14 Jun 2007, Tom Lane wrote: > Oleg Bartunov <oleg@sai.msu.su> writes: >> You're correct. But we can't defend users from all possible errors. >> Other side, that we need somehow to help user to identify what fts >> configuration was used to produce tsvector. For example, comment on >> tsvector column would be useful, but we don't know how to do this >> automatically. > > Yeah, I was wondering about that too. The only way we could relax the > superuser, you-better-know-what-you're-doing restriction on changing > configurations would be if we had a way to identify which tsvector > columns needed to be updated. Right now that's pretty hard to find out > because the references to configurations are buried in the bodies of > trigger functions. That whole trigger-function business is not the > nicest part of tsearch2, either ... it'd be better if we could automate > tsvector maintenance more. yes, trigger function is a complex stuff, our tsearch() trigger is an example of automated stuff. It could be written very easy on plpgsql, for example. =# create function my_update() returns trigger as $$ BEGIN NEW.fts= setweight( to_tsvector('english',NEW.t1),'A') || ' ' || setweight( to_tsvector('english',NEW.t2),'B');RETURN NEW; END; $$ language plpgsql; > > One thing I was thinking about is that rather than storing a physical > tsvector column, people might index a "virtual" column using functional > indexes: > > create index ... (to_tsvector('english', big_text_col)) > > which could be queried > > select ... where to_tsvector('english', big_text_col) @@ tsquery this is already possible for gin index create index gin_text_idx on test using gin ( ( coalesce(to_tsvector(title),'') || coalesce(to_tsvector(body),'') ) ); apod=# select title from test where (coalesce(to_tsvector(title),'') || coalesce(to_tsvector(body),'') ) @@ to_tsquery('supernovae') order by sdate desc limit 10; > > Assuming that the index is lossy, the index condition would have to be > rechecked, so to_tsvector() would have to be recomputed, but only at the > rows identified as candidate matches by the index. The I/O savings from > eliminating the heap's tsvector column might counterbalance the extra > CPU for recomputing tsvectors. Or not, but in any case this is > attractive because it doesn't need any handmade maintenance support like > a trigger --- the regular index maintenance code does it all. I'm afraid it wouldn't work for all cases. We already have headline() function which had to reparse document to produce text snippet and it's very slow and eats most select time. ALso, trigger stuff is a normal machinery for databases. > > It strikes me that we could play the same kind of game we played to make > nextval() references to sequences be recognized as dependencies on > sequences. Invent a "regconfig" OID type that's just like regclass > except it handles OIDs of ts_config entries instead of pg_class entries, > and make the first argument of to_tsvector be one of those: > > create index ... (to_tsvector('english'::regconfig, big_text_col)) > > Now dependency.c can be taught to recognize the regconfig Const as > depending on the referenced ts_config entry, and voila we have a > pg_depend entry showing that the index depends on the configuration. > What we actually do about it is another question, but this at least > gets the knowledge into the system. > interesting. And \di could display all configuration stuff for text search indexes ? > [ thinks some more... ] If we revived the GENERATED AS patch, > you could imagine computing tsvector columns via "GENERATED AS > to_tsvector('english'::regconfig, big_text_col)" instead of a > trigger, and then again you've got the dependency exposed where > the system can see it. I don't wanna try to do that for 8.3, > but it might be a good path to pursue in future, instead of assuming > that triggers will be the way forevermore. > > Thoughts? No way with standard. GENERATED AS says that "all columns references in an expression associated with a generated column must be to columns of the base table containing that generated column." tsvector could be result of rather complex select involving several tables. Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
pgsql-hackers by date: