[PATCH] Exorcise "zero-dimensional" arrays (Was: Re: Should array_length() Return NULL) - Mailing list pgsql-hackers
From | Brendan Jurd |
---|---|
Subject | [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: Should array_length() Return NULL) |
Date | |
Msg-id | CADxJZo0keVhSRzUnot2Y6g46tsP7f-eV28iEmBd3AtLjU-YTMA@mail.gmail.com Whole thread Raw |
Responses |
Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re:
Should array_length() Return NULL)
Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: Should array_length() Return NULL) Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: Should array_length() Return NULL) |
List | pgsql-hackers |
On 17 March 2013 05:19, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Brendan Jurd <direvus@gmail.com> writes: >> On 16 March 2013 09:07, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> The thing is that that syntax creates an array of zero dimensions, >>> not one that has 1 dimension and zero elements. > >> I'm going to ask the question that immediately comes to mind: Is there >> anything good at all about being able to define a zero-dimensional >> array? > > Perhaps not. I think for most uses, a 1-D zero-length array would be > just as good. I guess what I'd want to know is whether we also need > to support higher-dimensional zero-size arrays, and if so, what does > the I/O syntax for those look like? > > Another fly in the ointment is that if we do redefine '{}' as meaning > something other than a zero-D array, how will we handle existing > database entries that are zero-D arrays? > Hello hackers, I submit a patch to rectify the weird and confusing quirk of Postgres to use "zero dimensions" to signify an empty array. Rationale: The concept of an array without dimensions is a) incoherent, and b) leads to astonishing results from our suite of user-facing array functions. array_length, array_dims, array_ndims, array_lower and array_upper all return NULL when presented such an array. When a user writes ARRAY[]::int[], they expect to get an empty array, but instead we've been giving them a bizarre semi-functional proto-array. Not cool. Approach: The patch teaches postgres to regard zero as an invalid number of dimensions (which it very much is), and allows instead for fully armed and operational empty arrays. The simplest form of empty array is the 1-D empty array (ARRAY[] or '{}') but the patch also allows for empty arrays over multiple dimensions, meaning that the final dimension has a length of zero, but the other dimensions may have any valid length. For example, '{{},{},{}}' is a 2-D empty array with dimension lengths {3,0} and dimension bounds [1:3][1:0]. An empty array dimension may have any valid lower bound, but by default the lower bound will be 1 and the upper bound will therefore be 0. Any zero-D arrays read in from existing database entries will be output as 1-D empty arrays from array_recv. There is an existing limitation where you cannot extend a multi-D array by assigning values to subscripts or slices. I've made no attempt to address that limitation. The patch improves some error reporting, particularly by adding errdetail messages for ereports of problems parsing a curly-brace array literal. There is some miscellaneous code cleanup included; I moved the hardcoded characters '{', '}', etc. into constants, unwound a superfluous inner loop from ArrayCount, and corrected some mistakes in code comments and error texts. If preferred for reviewing, I can split those changes (and/or the new errdetails) out into a separate patch. Incompatibility: This patch introduces an incompatible change in the behaviour of the aforementioned array functions -- instead of returning NULL for empty arrays they return meaningful values. Applications relying on the old behaviour to test for emptiness may be disrupted. One can future-proof against this by changing e.g. IF array_length(arr, 1) IS NULL ... to IF coalesce(array_length(arr, 1), 0) = 0 ... There is also a change in the way array subscript assignment works. Previously it wasn't possible to make a distinction between assigning to an empty array and assigning to a NULL, so in either case an expression like "arr[4] := 1" would create "[4:4]={1}". Now there is a distinction. If you assign to an empty array you will get "{NULL, NULL, NULL, 1}", whereas if you assign to a NULL you'll get "[4:4]={1}". Regression tests: The regression tests were updated to reflect behavioural changes. Documentation: A minor update to the prose in chapter 8.15 is included in the patch. Issues: Fixed-length arrays (like oidvector) are zero-based, which means that they are rendered into string form with their dimension info shown. So an empty oidvector now renders as "[0:-1]={}", which is correct but ugly. I'm not delighted with this side-effect but I'm not sure what can be done about it. I'm open to ideas. Diffstat: doc/src/sgml/array.sgml | 4 +- src/backend/executor/execQual.c | 77 +++++- src/backend/utils/adt/array_userfuncs.c | 23 +- src/backend/utils/adt/arrayfuncs.c | 671 +++++++++++++++++++++++++---------------------- src/backend/utils/adt/arrayutils.c | 4 + src/backend/utils/adt/xml.c | 19 +- src/include/c.h | 1 + src/include/utils/array.h | 4 + src/pl/plpython/plpy_typeio.c | 3 - src/test/isolation/expected/timeouts.out | 16 +- src/test/isolation/specs/timeouts.spec | 8 +- src/test/regress/expected/arrays.out | 55 ++-- src/test/regress/expected/create_function_3.out | 2 +- src/test/regress/expected/polymorphism.out | 2 +- src/test/regress/sql/arrays.sql | 6 +- 15 files changed, 519 insertions(+), 376 deletions(-) I'll add this to the Commit Fest app. It is intended to resolve TODO item "Improve handling of empty arrays". Cheers, BJ
Attachment
pgsql-hackers by date: