Re: tsearch refactorings - Mailing list pgsql-patches
From | Teodor Sigaev |
---|---|
Subject | Re: tsearch refactorings |
Date | |
Msg-id | 46DED16A.9000505@sigaev.ru Whole thread Raw |
In response to | tsearch refactorings ("Heikki Linnakangas" <heikki@enterprisedb.com>) |
Responses |
Re: tsearch refactorings
|
List | pgsql-patches |
Heikki, I see some strange changes in your patch, not related to tsearch at all: contrib/pageinspect/pageinspect.sql.in contrib/pageinspect/rawpage.c > The usage of the QueryItem struct was very confusing. It was used for > both operators and operands. For operators, "val" was a single character > casted to a int4, marking the operator type. For operands, val was the > CRC-32 of the value. Other fields were used only either for operands or > for operators. The biggest change in the patch is that I broke the > QueryItem struct into QueryOperator and QueryOperand. Type was really ... > - Removed ParseQueryNode struct used internally by makepol and friends. > push*-functions now construct QueryItems directly. It's needed to set unused bytes in QueryItem to zero, it's common requiremens for types in pgsql. After allocating space for tsquery in parse_tsquery you copy just sizeof(QueryOperator) bytes and leave sizeof(QueryItem) - sizeof(QueryOperator) bytes untouched. QueryOperand is a biggest component in QueryItem union. I don't check other places. > that? And parse_query always produces trees that are in prefix notation, > so the left-field is really redundant, but using tsqueryrecv, you could > inject queries that are not in prefix notation; is there anything in the > code that depends on that? It's used by TS_execute for optimization reason. With clear postfix notation you should go through every nodes. For example: FALSE FALSE & FALSE & You will go to the end of query to produce correct result. In fact, TSQuery is a prefix notation with pointer to another operand or, by another words, just a plain view of tree where right operand of operation is always placed after operation. That notation allows to calculate only one of operand if it possible: & FALSE & FALSE FALSE 1 2 3 4 5 --Nodes After evaluating of second node you can return FALSE for whole expression and do not evaluate nodes 3-5. For query & TRUE & FALSE & FALSE it's needed to evaluate 1,2,3,4 nodes. In most cases checking QI_VAL node is much more expensive that QI_OPR > > - There's many internal intermediate representations of a query: > TSQuery, a QTNode-tree, NODE-tree (in tsquery_cleanup.c), prefix > notation stack of QueryItems (in parser), infix-tree. Could we remove > some of these? I havn't strong objections, QTNode and NODE are tree-like structures, but TSQuery is a postfix notation for storage in plain memory. NODE is used only cleanup stop-word placeholders, so it's a binary tree while QTNode represents t-ary tree (with any number of children). Thank you for your interesting in tsearch - after recheck of problem pointed above I'll commit your patch. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
pgsql-patches by date: