Re: [GENERAL] workaround for lack of REPLACE() function - Mailing list pgsql-hackers
| From | Joe Conway |
|---|---|
| Subject | Re: [GENERAL] workaround for lack of REPLACE() function |
| Date | |
| Msg-id | 3D549ED8.40102@joeconway.com Whole thread Raw |
| In response to | Re: [GENERAL] workaround for lack of REPLACE() function (Thomas Lockhart <lockhart@fourpalms.org>) |
| Responses |
Re: [GENERAL] workaround for lack of REPLACE() function
|
| List | pgsql-hackers |
Joe Conway wrote:
> more work than I had time for when I wrote the current replace(). But as
> I said, if there is support for getting this into the backend, I'll add
> it to my todo list:
>
> - Create new backend function replace()
> - Either create new backend functions, or merge into existing functions:
> to_hex() and extract_tok()
>
I'm just starting to take a look at this again. While studying the
current text_substr() function I found two behaviors which conflict with
specific SQL92/SQL99 requirements, and one bug. First the spec
compliance -- SQL92 section 6.7/SQL99 section 6.18 say:
If <character substring function> is specified, then:
a) Let C be the value of the <character value expression>, let LC be the length of C, and let S be the value of the
<startposition>.
b) If <string length> is specified, then let L be the value of <string length> and let E be S+L. Otherwise, let E be
thelarger of LC + 1 and S.
c) If either C, S, or L is the null value, then the result of the <character substring function> is the null value.
d) If E is less than S, then an exception condition is raised: data exception-substring error.
e) Case: i) If S is greater than LC or if E is less than 1, then the result of the <character substring
function>is a zero-length string. ii) Otherwise, 1) Let SI be the larger of S and 1. Let El be the smaller of E
and LC+l. Let Ll be El-Sl. 2) The result of the <character substring function> is a character
stringcontaining the Ll characters of C starting at character number Sl in the same order that the characters
appearin C.
The only way for d) to be true is when L < 0. Instead of an error, we do:
test=# select substr('hello',2,-1); substr
-------- ello
(1 row)
The other spec issue is wrt para e)i). If E (=S+L) < 1, we should return
a zero-length string. Currently I get:
test=# select substr('hello',-4,3); substr
-------- hello
(1 row)
Neither behavior is documented (unless it's somewhere other than:
http://developer.postgresql.org/docs/postgres/functions-string.html ).
The bug is this one:
test=# create DATABASE testmb with encoding = 'EUC_JP';
CREATE DATABASE
test=# \c testmb
You are now connected to database testmb.
testmb=# select substr('hello',6,2); substr
-------- ~
(1 row)
testmb=# \c test
You are now connected to database test.
test=# select substr('hello',6,2); substr
--------
(1 row)
The multibyte database behavior is the bug. The SQL_ASCII behavior is
correct (zero-length string):
test=# select substr('hello',6,2) is null; ?column?
---------- f
(1 row)
Any objection if I rework this function to meet SQL92 and fix the bug?
Or is the SQL92 part not desirable because it breaks backward
compatability?
In any case, can the #ifdef MULTIBYTE's be removed now in favor of a
test for encoding max length?
Joe
pgsql-hackers by date: