Re: Latest on CITEXT 2.0 - Mailing list pgsql-hackers
From | David E. Wheeler |
---|---|
Subject | Re: Latest on CITEXT 2.0 |
Date | |
Msg-id | 25892F1D-AB99-4CBB-AA04-A5A4267ED196@kineticode.com Whole thread Raw |
In response to | Re: Latest on CITEXT 2.0 (Martijn van Oosterhout <kleptog@svana.org>) |
Responses |
Re: Latest on CITEXT 2.0
Re: Latest on CITEXT 2.0 |
List | pgsql-hackers |
Thanks a million for your answers, Martijn. I just have some more stupid questions, if you could bear with me. On Jun 26, 2008, at 03:28, Martijn van Oosterhout wrote: > When creating an index, your comparison functions are going ot be > called O(N log N) times. If they leak into a context that isn't > regularly freed you may have a problem. I'd suggest loking at how the > text comparisons do it. PG_FREE_IF_COPY() is probably a good idea > because the incoming tuples may be detoasted. Okay, I see that text_cmp in varlena doesn't use PG_FREE_IF_COPY(), and neither do text_smaller nor text_larger (which just dispatch to text_cmp anyway). The operator functions *do* use PG_FREE_IF_COPY(). So I'm guessing it's these functions you're talking about. However, my implementation just looks like this: Datum citext_ne (PG_FUNCTION_ARGS) { // Fast path for different-length inputs. Okay for canonical equivalence? if (VARSIZE(PG_GETARG_TEXT_P(0)) != VARSIZE(PG_GETARG_TEXT_P(1))) PG_RETURN_BOOL( 1 ); PG_RETURN_BOOL(citextcmp( PG_ARGS ) != 0 ); } I don't *thinkI any variables are copied there. citextcmp() is just this: int citextcmp (PG_FUNCTION_ARGS) { // XXX These are all just references to existing structures, right? text * left = PG_GETARG_TEXT_P(0); text * right = PG_GETARG_TEXT_P(1); return varstr_cmp( cilower( left ), VARSIZE_ANY_EXHDR(left), cilower( right ), VARSIZE_ANY_EXHDR(right) ); } Again, no copying. cilower() does copy: int index, len; char * result; index = 0; len = VARSIZE(arg) - VARHDRSZ; result = (char *) palloc( strlen( str ) + 1 ); for (index = 0; index <= len; index++) { result[index] = tolower((unsigned char) str[index] ); } // XXXI don't need to pfree result if I'm returning it, right? return result; But the copied value is returned. Hrm…it should probably be pfreed somewhere, yes? So I'm wondering if I should change citextcmp to pfree values? Something like this: text * left = PG_GETARG_TEXT_P(0); text * right = PG_GETARG_TEXT_P(1); char * lcstr = cilower( left ); char* rcstr = cilower( right ); int result = varstr_cmp( cilower( left ), VARSIZE_ANY_EXHDR(left), cilower( right ), VARSIZE_ANY_EXHDR(right) ); pfree( lcstr ); pfree( rcstr ); return result; This is the only function that calls cilower(). And I *think* it's the only place where values are copied or memory is allocated needing to be freed. Does that sound right to you? On a side note, I've implemented this pretty differently from how the text functions are implemented in varlena.c, just to try to keep things succinct. But I'm wondering now if I shouldn't switch back to the style used by varlena.c, if only to keep the style the same, and thus perhaps to increase the chances that citext would be a welcome contrib addition. Thoughts? Many thanks again. You're a great help to this C n00b. Best, David
pgsql-hackers by date: