Re: Review: Patch: Allow substring/replace() to get/set bit values - Mailing list pgsql-hackers
From | Kevin Grittner |
---|---|
Subject | Re: Review: Patch: Allow substring/replace() to get/set bit values |
Date | |
Msg-id | 4B54996F020000250002E6FE@gw.wicourts.gov Whole thread Raw |
Responses |
Re: Review: Patch: Allow substring/replace() to get/set bit values
|
List | pgsql-hackers |
Leonardo F wrote: > Done (I think). Added a couple of simple tests for bit overlay. > > I didn't include the catversion.h changes: obviously the > CATALOG_VERSION_NO has to be changed. [Following the "Reviewing A Patch" wiki, more or less] The patch is in context diff format and applies cleanly. It builds and passes regression tests. There are documentation changes and regression tests, although there is a must-fix problem in the docs and I would prefer to see better coverage of corner cases in the regression tests. In the documentation, the get_bit and set_bit methods are added to a list where we state "The following SQL-standard functions work on bit strings as well as character strings"; however they are not SQL-standard functions and are implemented on binary strings, not character strings. The tests don't include any examples where the bit string lines up with a byte boundary or is operating on the first or last bit of a byte, and the overlay function is not tested with values which change the length of the bit string. (All of these cases seem to work in the current version, but seem like the sort of thing which could get broken in revision.) It seems to *me* like the patch does something useful; also, it was on the TODO list, the author felt it was worth the effort, the overlay support for bit strings seems to comply with the spirit of the standard by extending a standard feature to a legacy data type, the get_bit and set_bit functions extend custom PostgreSQL behavior we have on some other data types to an additional type where it seems useful, and there is an incidental fix for a documentation gap. So I think we want it. We don't already have it. As mentioned above, I believe it follows the SQL spec where appropriate and PostgreSQL community-agreed behavior where appropriate. I don't see where pg_dump support is needed. I see no dangers. There is a peculiar omission from the patch -- it adds supportfor the overlay function to bit strings but not binary strings. Sincethe standard drops the bit string as a standard type in the 2003standard, in favor of boolean type and binary string types, it seemsodd to omit binary string support (which is defined in the standard)but include bit string support (which isn't). What the patch adds seemsuseful, and I don't think the omission should be considered fatal, butit would be nice to see us also cover binary strings if we can. The features worked as advertised in all the tests I could think up. There were no assertion failures or crashes. Simple tests performed reasonably well -- a SELECT adding the get_bit and set_bit around a bit literal typically ran in under 0.05 ms longer than a SELECT of the literal by itself. SELECT of an overlay function typically ran about 0.25 ms longer than the literal by itself. Performance of the overlay function could probably be improved with a C implementation, rather than it being declared as the concatenation of the results of two substring functions with a third value. Unless someone actually has unacceptable performance with this implementation, I'm inclined to think that optimization isn't worth the effort. This patch adds new functions rather than aiming to improve peformance of anything; extensive performance tests were not done. No slowdowns of other things seems likely and none was observed. I didn't see any portability issues; the techniques used seemed consistent with the related code, so it should be as portable as what already is in use. Well, there was one thing which is beyond my ken in this regard: I'm not sure that the cases from bits8 to (unsigned char *) are needed or appropriate, although on my machine they don't seem to hurt. Perhaps someone with a better grasp of such issues can comment. No new warning issues were generated in the build. I saw no problem interdependencies. In the extremely nit-picky department: I'm not sure the leading whitespace in pg_proc.h is as it should be. I'd prefer to see the comments for get_bit and set_bit mention that the location is specified left-to-right in a zero-based fashion consistent with the other get_bit and set_bit functions, but inconsistent with the standard substring, position, overlay functions. Personally, I find it unfortunate that our extensions introduced this inconsistency, but let's keep it consistent within the similarly named functions. This patch uses all-lowercase function names for the internal functions, which is consistent with the rest of the varbit.c file, but inconsistent with the similar functions in varlena.c. I'm not sure which it is more important that they match. There is bound to be some surprise factor in the mismatch either way. I'm not sure which way would be least astonishing. It seems odd to me that you specify the bit to set as a 32 bit integer, and that that is what get_bit returns, but again, this is consistent with what we do for the bytea functions, so it's probably the right thing to match that. Using a ERRCODE_ARRAY_SUBSCRIPT_ERROR for a bit position which is out-of-bounds seems somewhat bizarre to me -- I'd probably have chosen ERRCODE_INVALID_PARAMETER_VALUE in a green field -- but again, it matches the bytea implementation. This last point is probably more a matter of C coding style than anything, and I'm way to rusty in C to want to be the final arbiter of something like this, but it seems to me that oldByte and newByte local variables are just cluttering things up rather than clarifying: int oldByte, newByte; [...] oldByte = ((unsigned char *) r)[byteNo]; if (newBit == 0) newByte = oldByte & (~(1 << bitNo)); else newByte = oldByte | (1 << bitNo); ((unsigned char *) r)[byteNo] = newByte; vs if (newBit == 0) ((unsigned char *) r)[byteNo] &= (~(1 << bitNo)); else ((unsigned char *) r)[byteNo] |=(1 << bitNo); That's everything I could come up with. I'm putting this back to "Waiting on Author" for at least a documentation fix. Other issues mentioned above may warrant some discussion or action, too. -Kevin
pgsql-hackers by date: