Thread: [RFC] Unsigned integer support.
Hello hackers, I know the development community is in the middle of the July 2008 commit-fest, so I apologize if this design proposals are in appropriate at this time. I am looking to take advantage of PostgreSQL extensible type system and implement unsigned integer support. The data I am dealing with contains many unsigned data types and I am planning on using this type to reduce storage consumption. I am initially looking to add support for 16-bit and 32-bit unsigned integers with the potential to add 8-bit and 64-bit unsigned integers if needed or desired by the community. Searching through the list archives, I found two objections raised in the past: 1. Implicit casts between different data types. I am hoping the removal of many of the implicit casts in PostgreSQL 8.3 will simplify this task to where this objection can be removed. My plan (without much experimentation) is to have maybe a handful of casts (something like): * uint4 -> integer * integer -> uint4 * bigint -> uint4 * integer-> uint2 * uint2 -> smallint and then provide operators to provide a reasonable set of functionality. My initial thought for this functionality is to provide default operators on any type that is not implicitly casted on the psql command-line. As an example, I am planning for the following SQL statements to work correctly: 3000000000::uint4 + 10 and 3000000000::uint4 > 10 My understanding is the SQL standard does not provide support for unsigned integers, so I am planning on making all casts from unsigned integers to other data types explicit. Is this acceptable to the community? Another question for the community is should we allow the following cast? -1::uint4 Even though this is acceptable c-code, I am leaning towards throwing an out-of-range error when this occurs. Are there some areas I am missing or should investigate further before working on this project? 2. There is not much demand for unsigned integer types. Not much I can do about that :) I am willing to post my work as a PgFoundry project. PgFoundry already has an uint project: http://pgfoundry.org/projects/uint/ Unfortunately this project seems to have not gone anywhere. Last activity was late 2006 and there are not any files checked into the SCM repository. Is it acceptable to hijack this PgFoundry project? Or should I start a new project (assuming there is any interest in publishing this work). Although I am not targeting inclusion for this type in the core PostgreSQL code, I would like to post code for review and receive feedback from the community on this work. As I understand this RFC is the first step in the process :) Once I have some code ready for review, is it acceptable to use the commit-fest wiki for this project? Thanks much for your time! - Ryan
"Ryan Bradetich" <rbradetich@gmail.com> writes: > I am looking to take advantage of PostgreSQL extensible type system > and implement unsigned integer support. This has been proposed before, and foundered before on the question of implicit coercions. If you're willing to make all coercions *to* unsigned types be explicit (or at most assignment), then I think it can be made to work without breaking anything. But usually the folk who ask for this feature are hoping that bare integer literals like "42" will get interpreted as unsigned when they want them to be. The problem with that wish is illustrated by select 1500000000 + 1500000000; These literals might be either int4 or uint4, therefore this command might yield either an integer-overflow error or 3000000000::uint4. That's not a distinction you can fuzz over --- it's got to be one or the other, and backwards compatibility says it'd better be the first. > I am hoping the removal of many of the implicit casts in > PostgreSQL 8.3 will simplify this task to where this objection can be > removed. The implicit casts we removed were cross-type-category cases. If you hope for unsigned types to be considered part of the numeric category, there's no guidance for you there. In fact, the real nub of the problem is what type shall be initially assigned to an integer-looking literal, and how will you get things to behave sanely if that initial choice wasn't what was desired. We still have some issues around the fact that "42" isn't considered a smallint. Throwing in another possible meaning isn't going to help. > My understanding is the SQL standard does not provide support for > unsigned integers, so I am planning on making all casts from unsigned > integers to other data types explicit. It's really the other direction that would be contentious ... regards, tom lane
Hello Tom, On Thu, Jul 24, 2008 at 10:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Ryan Bradetich" <rbradetich@gmail.com> writes: >> I am looking to take advantage of PostgreSQL extensible type system >> and implement unsigned integer support. > > This has been proposed before, and foundered before on the question > of implicit coercions. If you're willing to make all coercions *to* > unsigned types be explicit (or at most assignment), then I think > it can be made to work without breaking anything. But usually the > folk who ask for this feature are hoping that bare integer literals > like "42" will get interpreted as unsigned when they want them to be. > The problem with that wish is illustrated by > > select 1500000000 + 1500000000; > > These literals might be either int4 or uint4, therefore this command > might yield either an integer-overflow error or 3000000000::uint4. > That's not a distinction you can fuzz over --- it's got to be one > or the other, and backwards compatibility says it'd better be the > first. I am in agreement with you on this. Since SQL does not specify unsigned types, I was assuming only explicit and assignment casts. I should have probably mentioned that in the RFC. Thanks for pointing this out. My main goal for this type is the reduced storage space. I am fine with people needing to cast to the unsigned types to benefit from the reduced storage space. My plans for the example above would be: 1. SELECT 1500000000 + 1500000000 --> Throws overflow error. 2. SELECT 1500000000::uint4 + 1500000000 --> Returns 3000000000::uint4. > >> I am hoping the removal of many of the implicit casts in >> PostgreSQL 8.3 will simplify this task to where this objection can be >> removed. > > The implicit casts we removed were cross-type-category cases. > If you hope for unsigned types to be considered part of the numeric > category, there's no guidance for you there. In fact, the real nub > of the problem is what type shall be initially assigned to an > integer-looking literal, and how will you get things to behave sanely > if that initial choice wasn't what was desired. We still have some > issues around the fact that "42" isn't considered a smallint. Throwing > in another possible meaning isn't going to help. > >> My understanding is the SQL standard does not provide support for >> unsigned integers, so I am planning on making all casts from unsigned >> integers to other data types explicit. > > It's really the other direction that would be contentious ... > > regards, tom lane Thanks for your comments! I have already started to play around a bit with the types and will hopefully have some code ready for review / feedback soon. - Ryan
"Ryan Bradetich" <rbradetich@gmail.com> writes: > My plans for the example above would be: > > 1. SELECT 1500000000 + 1500000000 --> Throws overflow error. > 2. SELECT 1500000000::uint4 + 1500000000 --> Returns 3000000000::uint4. I think that wouldn't actually work. Postgres's parser immediately assigns a type to the bare unquoted integral constant so it would end up with a int4 type. Then when it has to pick an operator for uint4+int4 it wouldn't be able to cast the int4 to uint4 because there would be no implicit cast. You could make it work by having a uint4+int4 operator which returns uint4 but then you're going to need a *lot* of operators.... One other idea that's been mentioned before is treating integral constants like 150000 as type "unknown" like the quoted '150000' constant is. That way the parser would see uint4+unknown and could pick the uint4 operator. But that would be a pretty massive semantics change. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support!
Am Friday, 25. July 2008 schrieb Ryan Bradetich: > PgFoundry already has an uint project: > http://pgfoundry.org/projects/uint/ > > Unfortunately this project seems to have not gone anywhere. Last > activity was late 2006 and there are not any files checked into the > SCM repository. > Is it acceptable to hijack this PgFoundry project? Or should I > start a new project (assuming there is any interest in publishing this > work). Please hijack the project and develop your code there. Of course you can always ask for advice here.
Hello Greg, On Fri, Jul 25, 2008 at 3:57 AM, Gregory Stark <stark@enterprisedb.com> wrote: > "Ryan Bradetich" <rbradetich@gmail.com> writes: > >> My plans for the example above would be: >> >> 1. SELECT 1500000000 + 1500000000 --> Throws overflow error. >> 2. SELECT 1500000000::uint4 + 1500000000 --> Returns 3000000000::uint4. > > I think that wouldn't actually work. Postgres's parser immediately assigns a > type to the bare unquoted integral constant so it would end up with a int4 > type. Then when it has to pick an operator for uint4+int4 it wouldn't be able > to cast the int4 to uint4 because there would be no implicit cast. > > You could make it work by having a uint4+int4 operator which returns uint4 but > then you're going to need a *lot* of operators.... This was my plan. I performed some testing last night to verify that bare literals are considered plain integers and would not be implicitly casted to a different type (i.e. smallint or bigint). I am seeing three operators for most operations: 1. uint4 -> uint4 = uint4 2. int4 -> uint4 = uint4 3. uint4 -> int4 = uint4 Is there something I need to watch out for when adding this number of operators (i.e. performance impact, etc)? Some tests I should be running to measure the impact of adding these operators? > One other idea that's been mentioned before is treating integral constants > like 150000 as type "unknown" like the quoted '150000' constant is. That way > the parser would see uint4+unknown and could pick the uint4 operator. But that > would be a pretty massive semantics change. This would require changes to the core PostgreSQL code correct? My goal for this type was to have it as an external project on PgFoundry since there does not appear to be much demand for it and unsigned types are not specified in the SQL standard. If the community decides this support would be better in core PostgreSQL code, then I am willing to help with that work, but I will need a significant amount of guidance :) With my limited knowledge, the best (and easiest) path seems to take advantage of the extensible type system in PostgreSQL and support unsigned integers as a PgFoundry project. Thanks for your review and comments! - Ryan > -- > Gregory Stark > EnterpriseDB http://www.enterprisedb.com > Ask me about EnterpriseDB's 24x7 Postgres support!
Hello Peter, On Fri, Jul 25, 2008 at 5:14 AM, Peter Eisentraut <peter_e@gmx.net> wrote: > Am Friday, 25. July 2008 schrieb Ryan Bradetich: >> PgFoundry already has an uint project: >> http://pgfoundry.org/projects/uint/ >> >> Unfortunately this project seems to have not gone anywhere. Last >> activity was late 2006 and there are not any files checked into the >> SCM repository. >> Is it acceptable to hijack this PgFoundry project? Or should I >> start a new project (assuming there is any interest in publishing this >> work). > > Please hijack the project and develop your code there. Of course you can > always ask for advice here. I will work on getting the PgFoundry project setup. Thanks! - Ryan
Gregory Stark escribió: > One other idea that's been mentioned before is treating integral constants > like 150000 as type "unknown" like the quoted '150000' constant is. That way > the parser would see uint4+unknown and could pick the uint4 operator. But that > would be a pretty massive semantics change. Hmm, if we do that, how would the system resolve something like this? select 1000 + 1000 There would be no clue as to what + operator to pick, since both operands are unknown. This is in fact what happens today with alvherre=# select '100' + '100'; ERROR: operator is not unique: unknown + unknown at character 14 HINT: Could not choose a best candidate operator. You might need to add explicit type casts. STATEMENT: select '100' + '100'; I think this is a nonstarter. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
"Alvaro Herrera" <alvherre@commandprompt.com> writes: > Hmm, if we do that, how would the system resolve something like this? > > select 1000 + 1000 Well we have the same problem with 'foo' || 'bar'. The question I think is whether the solution there scales to having two different fallback types. > There would be no clue as to what + operator to pick, since both > operands are unknown. This is in fact what happens today with > > alvherre=# select '100' + '100'; > ERROR: operator is not unique: unknown + unknown at character 14 > HINT: Could not choose a best candidate operator. You might need to add explicit type casts. > STATEMENT: select '100' + '100'; Perhaps we could kill two birds with one stone... -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support!
Gregory Stark escribió: > "Alvaro Herrera" <alvherre@commandprompt.com> writes: > > > Hmm, if we do that, how would the system resolve something like this? > > > > select 1000 + 1000 > > Well we have the same problem with 'foo' || 'bar'. The question I think is > whether the solution there scales to having two different fallback types. Hmm, right. But you need more than two: consider alvherre=# select 0.42 + 1;?column? ---------- 1.42 (1 ligne) However, it would be neat if this behaved the same as alvherre=# select '0.42' + 1; ERROR: invalid input syntax for integer: "0.42" STATEMENT: select '0.42' + 1; -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
>>> Alvaro Herrera <alvherre@commandprompt.com> wrote: > consider > > alvherre=# select 0.42 + 1; > ?column? > ---------- > 1.42 > (1 ligne) > > However, it would be neat if this behaved the same as > > alvherre=# select '0.42' + 1; > ERROR: invalid input syntax for integer: "0.42" > STATEMENT: select '0.42' + 1; I wouldn't want the former to fail. I also wouldn't like these to fail: select 5000000000 + 1; select 'abc'::text || 'def'::varchar(3); -Kevin
Kevin Grittner escribió: > >>> Alvaro Herrera <alvherre@commandprompt.com> wrote: > > > consider > > > > alvherre=# select 0.42 + 1; > > ?column? > > ---------- > > 1.42 > > (1 ligne) > > > > However, it would be neat if this behaved the same as > > > > alvherre=# select '0.42' + 1; > > ERROR: invalid input syntax for integer: "0.42" > > STATEMENT: select '0.42' + 1; > > I wouldn't want the former to fail. Sorry, I was unclear. What I meant was that both 0.42 + 1 and '0.42' + 1 should be treated the same, and they should both produce a numeric output. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
"Ryan Bradetich" <rbradetich@gmail.com> writes: > On Fri, Jul 25, 2008 at 3:57 AM, Gregory Stark <stark@enterprisedb.com> wrote: >> "Ryan Bradetich" <rbradetich@gmail.com> writes: >>> My plans for the example above would be: >>> >>> 1. SELECT 1500000000 + 1500000000 --> Throws overflow error. >>> 2. SELECT 1500000000::uint4 + 1500000000 --> Returns 3000000000::uint4. >> >> You could make it work by having a uint4+int4 operator which returns uint4 but >> then you're going to need a *lot* of operators.... > This was my plan. Like he says, it's a *lot* of operators, and the point doesn't seem entirely clear to me. You'll still have overflow cases, they'll just be in different places. Consider the idea of not having any uint4-specific arithmetic operators, but instead providing the following: * assignment casts from int4 and int8 to uint4 (these throw error if out of range, of course)* implicit cast from uint4to int8 (can never fail) The effect of providing the latter cast would be that any arithmetic involving a uint4 column would automatically be done in int8. Which would make it a shade slower than a native implementation, but probably not enough slower to be a problem --- and you'd avoid having to write dozens of operators and underlying support functions. Storing into the uint4 column would work fine with no extra notation because of the assignment casts. Moreover, you'd avoid cluttering the system with a pile of cross-type operators, which we have recently realized are not a good thing, because they increase the likelihood of "ambiguous operator" problems --- see http://archives.postgresql.org/pgsql-hackers/2008-06/msg00750.php For uint8 you'd have to promote to numeric to guarantee no failure in the implicit cast; which is going to be a rather bigger performance hit, but I don't really see uint8 as being a type with huge demand. Now you probably *will* want cross-type comparison operators, if you are going to support indexing of unsigned columns, so that something likeuint4col > 42 can be indexed without any casting. But limiting yourself to the six basic comparison operators certainly makes it a much less bulky project. regards, tom lane
> -----Original Message----- > From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers- > owner@postgresql.org] On Behalf Of Tom Lane > Sent: Friday, July 25, 2008 12:32 PM > To: Ryan Bradetich > Cc: Gregory Stark; pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] [RFC] Unsigned integer support. > > "Ryan Bradetich" <rbradetich@gmail.com> writes: > > On Fri, Jul 25, 2008 at 3:57 AM, Gregory Stark > <stark@enterprisedb.com> wrote: > >> "Ryan Bradetich" <rbradetich@gmail.com> writes: > >>> My plans for the example above would be: > >>> > >>> 1. SELECT 1500000000 + 1500000000 --> Throws overflow error. > >>> 2. SELECT 1500000000::uint4 + 1500000000 --> Returns > 3000000000::uint4. > >> > >> You could make it work by having a uint4+int4 operator which returns > uint4 but > >> then you're going to need a *lot* of operators.... > > > This was my plan. > > Like he says, it's a *lot* of operators, and the point doesn't seem > entirely clear to me. You'll still have overflow cases, they'll just > be > in different places. > > Consider the idea of not having any uint4-specific arithmetic > operators, > but instead providing the following: > > * assignment casts from int4 and int8 to uint4 > (these throw error if out of range, of course) > * implicit cast from uint4 to int8 (can never fail) > > The effect of providing the latter cast would be that any arithmetic > involving a uint4 column would automatically be done in int8. Which > would make it a shade slower than a native implementation, but probably > not enough slower to be a problem --- and you'd avoid having to write > dozens of operators and underlying support functions. Storing into the > uint4 column would work fine with no extra notation because of the > assignment casts. > > Moreover, you'd avoid cluttering the system with a pile of cross-type > operators, which we have recently realized are not a good thing, > because > they increase the likelihood of "ambiguous operator" problems --- see > http://archives.postgresql.org/pgsql-hackers/2008-06/msg00750.php > > For uint8 you'd have to promote to numeric to guarantee no failure > in the implicit cast; which is going to be a rather bigger performance > hit, but I don't really see uint8 as being a type with huge demand. > > Now you probably *will* want cross-type comparison operators, if you > are going to support indexing of unsigned columns, so that something > like > uint4col > 42 > can be indexed without any casting. But limiting yourself to the six > basic comparison operators certainly makes it a much less bulky > project. At the cost of one bit of storage, you have compatible types using CREATE DOMAIN: CREATE DOMAIN name [ AS ] data_type [ DEFAULT expression ] [ constraint [ ... ] ] where constraint is: [ CONSTRAINT constraint_name ] { NOT NULL | NULL | CHECK (expression) } More specifically: CREATE DOMAIN usmallint AS SMALLINT CHECK(VALUE > 0); CREATE DOMAIN uinteger AS INTEGER CHECK(VALUE > 0); CREATE DOMAIN ubigint AS BIGINT CHECK(VALUE > 0); CREATE DOMAIN unumeric AS NUMERIC CHECK(VALUE > 0); Seems like a heck of a lot less work to me. Not to mention very easy to use. C:\Program Files (x86)\PostgreSQL\8.3\bin>psql -h localhost -U postgres domaintest Password for user postgres: Welcome to psql 8.3.3, the PostgreSQL interactive terminal. Type: \copyright for distribution terms \h for help with SQL commands \? for help with psql commands \g orterminate with semicolon to execute query \q to quit Warning: Console code page (437) differs from Windows code page (1252) 8-bit characters might not work correctly.See psql reference page "Notes for Windows users" for details. domaintest=# CREATE DOMAIN usmallint AS SMALLINT CHECK(VALUE > 0); CREATE DOMAIN domaintest=# CREATE DOMAIN uinteger AS INTEGER CHECK(VALUE > 0); CREATE DOMAIN domaintest=# CREATE DOMAIN ubigint AS BIGINT CHECK(VALUE > 0); CREATE DOMAIN domaintest=# CREATE DOMAIN unumeric AS NUMERIC CHECK(VALUE > 0); CREATE DOMAIN domaintest=# domaintest=# create table integer_types ( domaintest(# usCol usmallint, domaintest(# sCol smallint, domaintest(# uiCol uinteger, domaintest(# iCol integer, domaintest(# ubCol ubigint, domaintest(# bCol bigint, domaintest(# unCol unumeric, domaintest(# nCol numeric domaintest(# ); CREATE TABLE domaintest=# create index i1 on integer_types(usCol); CREATE INDEX domaintest=# create index i2 on integer_types(sCol); CREATE INDEX domaintest=# create index i3 on integer_types(uiCol); CREATE INDEX domaintest=# create index i4 on integer_types(iCol); CREATE INDEX domaintest=# create index i5 on integer_types(ubCol); CREATE INDEX domaintest=# create index i6 on integer_types(bCol); CREATE INDEX domaintest=# create index i7 on integer_types(unCol); CREATE INDEX domaintest=# create index i8 on integer_types(nCol); CREATE INDEX domaintest=# insert into integer_types values(1,1,1,1,1,1,1,1); INSERT 0 1 domaintest=# select * from integer_types;uscol | scol | uicol | icol | ubcol | bcol | uncol | ncol -------+------+-------+------+-------+------+-------+------ 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 (1 row) domaintest=# insert into integer_types (usCol) values (-1); ERROR: value for domain usmallint violates check constraint "usmallint_check" domaintest=#
Dann Corbit wrote: > > CREATE DOMAIN usmallint AS SMALLINT CHECK(VALUE > 0); > CREATE DOMAIN uinteger AS INTEGER CHECK(VALUE > 0); > CREATE DOMAIN ubigint AS BIGINT CHECK(VALUE > 0); > CREATE DOMAIN unumeric AS NUMERIC CHECK(VALUE > 0); > > > s/>/>=/g cheers andrew
> -----Original Message----- > From: Andrew Dunstan [mailto:andrew@dunslane.net] > Sent: Friday, July 25, 2008 1:11 PM > To: Dann Corbit > Cc: Tom Lane; Ryan Bradetich; Gregory Stark; pgsql- > hackers@postgresql.org > Subject: Re: [HACKERS] [RFC] Unsigned integer support. > > > > Dann Corbit wrote: > > > > CREATE DOMAIN usmallint AS SMALLINT CHECK(VALUE > 0); > > CREATE DOMAIN uinteger AS INTEGER CHECK(VALUE > 0); > > CREATE DOMAIN ubigint AS BIGINT CHECK(VALUE > 0); > > CREATE DOMAIN unumeric AS NUMERIC CHECK(VALUE > 0); > > > > > > > > s/>/>=/g I turned off the default "option" to remove extra line breaks. Future posts should not be quite as even and bletcherous. God willing, and the crick don't rise.
Dann Corbit wrote: >> -----Original Message----- >> From: Andrew Dunstan [mailto:andrew@dunslane.net] >> Sent: Friday, July 25, 2008 1:11 PM >> To: Dann Corbit >> Cc: Tom Lane; Ryan Bradetich; Gregory Stark; pgsql- >> hackers@postgresql.org >> Subject: Re: [HACKERS] [RFC] Unsigned integer support. >> >> >> >> Dann Corbit wrote: >> >>> CREATE DOMAIN usmallint AS SMALLINT CHECK(VALUE > 0); >>> CREATE DOMAIN uinteger AS INTEGER CHECK(VALUE > 0); >>> CREATE DOMAIN ubigint AS BIGINT CHECK(VALUE > 0); >>> CREATE DOMAIN unumeric AS NUMERIC CHECK(VALUE > 0); >>> >>> >>> >>> >> s/>/>=/g >> > > I turned off the default "option" to remove extra line breaks. > Future posts should not be quite as even and bletcherous. > God willing, and the crick don't rise. > > I suspect you're missing my point, namely that 0 should be an allowed value for unsigned types. cheers andrew
> -----Original Message----- > From: Andrew Dunstan [mailto:andrew@dunslane.net] > Sent: Friday, July 25, 2008 1:28 PM > To: Dann Corbit > Cc: Tom Lane; Ryan Bradetich; Gregory Stark; pgsql- > hackers@postgresql.org > Subject: Re: [HACKERS] [RFC] Unsigned integer support. > > > > Dann Corbit wrote: > >> -----Original Message----- > >> From: Andrew Dunstan [mailto:andrew@dunslane.net] > >> Sent: Friday, July 25, 2008 1:11 PM > >> To: Dann Corbit > >> Cc: Tom Lane; Ryan Bradetich; Gregory Stark; pgsql- > >> hackers@postgresql.org > >> Subject: Re: [HACKERS] [RFC] Unsigned integer support. > >> > >> > >> > >> Dann Corbit wrote: > >> > >>> CREATE DOMAIN usmallint AS SMALLINT CHECK(VALUE > 0); > >>> CREATE DOMAIN uinteger AS INTEGER CHECK(VALUE > 0); > >>> CREATE DOMAIN ubigint AS BIGINT CHECK(VALUE > 0); > >>> CREATE DOMAIN unumeric AS NUMERIC CHECK(VALUE > 0); > >>> > >>> > >>> > >>> > >> s/>/>=/g > >> > > > > I turned off the default "option" to remove extra line breaks. > > Future posts should not be quite as even and bletcherous. > > God willing, and the crick don't rise. > > > > > > I suspect you're missing my point, namely that 0 should be an allowed > value for unsigned types. Quite right. The domains I created were really the 'natural numbers' rather than unsigned types.
Hello Dann, On Fri, Jul 25, 2008 at 1:06 PM, Dann Corbit <DCorbit@connx.com> wrote: > At the cost of one bit of storage, you have compatible types using Thanks for your review and feedback! Unfortunately, I do need the full range of the unsigned types for the project I am looking at. The reason I started working on these types is because it seemed wasteful to use the next size larger signed integer for the storage type of the unsigned integer. Thanks for the suggestion! - Ryan
"Ryan Bradetich" <rbradetich@gmail.com> writes: > ... I did have the following > concern looking through src/backend/utils/adt/int8.c: There is code that is > optionally compiled based on the INT64_IS_BUSTED pre-processor define. > Is this pre-processor define something I should worry about for portability > with this plan? I wouldn't worry, really ;-). Five or more years ago, it seemed important for PG to work on machines without functional int64 support, but there is little if any evidence that anyone is using current PG releases on such platforms. I might well be the last active PG hacker who gives a damn about that case at all, and even I long ago stopped expecting anything beyond core functionality to work on such a machine. Since your proposed unsigned types certainly aren't core functionality, I see no reason that they should need to work on INT64_IS_BUSTED platforms. > After I get uint types implemented, for fun I might try some benchmarks > to see if I can detect the int8 overhead on a 32-bit system. Right, you need to check that before drinking the kool-aid ... regards, tom lane
Tom, On Fri, Jul 25, 2008 at 12:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Consider the idea of not having any uint4-specific arithmetic operators, > but instead providing the following: > > * assignment casts from int4 and int8 to uint4 > (these throw error if out of range, of course) > * implicit cast from uint4 to int8 (can never fail) > > The effect of providing the latter cast would be that any arithmetic > involving a uint4 column would automatically be done in int8. Which > would make it a shade slower than a native implementation, but probably > not enough slower to be a problem --- and you'd avoid having to write > dozens of operators and underlying support functions. Storing into the > uint4 column would work fine with no extra notation because of the > assignment casts. This is an interesting idea that I will test out tonight. I did have the following concern looking through src/backend/utils/adt/int8.c: There is code that is optionally compiled based on the INT64_IS_BUSTED pre-processor define. Is this pre-processor define something I should worry about for portability with this plan? After I get uint types implemented, for fun I might try some benchmarks to see if I can detect the int8 overhead on a 32-bit system. > Moreover, you'd avoid cluttering the system with a pile of cross-type > operators, which we have recently realized are not a good thing, because > they increase the likelihood of "ambiguous operator" problems --- see > http://archives.postgresql.org/pgsql-hackers/2008-06/msg00750.php Good to know. Thanks for the link. > For uint8 you'd have to promote to numeric to guarantee no failure > in the implicit cast; which is going to be a rather bigger performance > hit, but I don't really see uint8 as being a type with huge demand. Hopefully I will not need the uint8 type. Right now for a project I am looking at I need the uint2 and uint4 types. uint8 support can come later if it is needed or requested. > Now you probably *will* want cross-type comparison operators, if you > are going to support indexing of unsigned columns, so that something > like > uint4col > 42 > can be indexed without any casting. But limiting yourself to the six > basic comparison operators certainly makes it a much less bulky project. This sounds excellent! Hopefully by using these operators I will be able to avoid most of the casting to int8 for my use, while still providing the complete functionality for this type. Thanks again for your review and feedback! - Ryan
Hi Ryan, I agree, I have had applications use uint types to avoid using a larger data type. I have actually had to patch an application developed for MySQL uint8 to signed int8 on PostgreSQL. In that case, the only operations that were performed where assignment and lookup. If we need to use the numeric type for calculations, that would be okay as long as the data is actually stored as 8-bytes, not numeric. It would certainly increase the ease of moving an application from MySQL to PostgreSQL. Cheers, Ken On Fri, Jul 25, 2008 at 08:50:02PM -0700, Ryan Bradetich wrote: > Hello Dann, > > On Fri, Jul 25, 2008 at 1:06 PM, Dann Corbit <DCorbit@connx.com> wrote: > > > At the cost of one bit of storage, you have compatible types using > > Thanks for your review and feedback! Unfortunately, I do need the full range > of the unsigned types for the project I am looking at. The reason I started > working on these types is because it seemed wasteful to use the next size > larger signed integer for the storage type of the unsigned integer. > > Thanks for the suggestion! > > - Ryan > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
On Jul 25, 2008, at 11:44 AM, Alvaro Herrera wrote: > However, it would be neat if this behaved the same as > > alvherre=# select '0.42' + 1; > ERROR: invalid input syntax for integer: "0.42" > STATEMENT: select '0.42' + 1; Do we really want to be making it easier for people to wrap numbers in quotes? -- Decibel!, aka Jim C. Nasby, Database Architect decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828
"Decibel!" <decibel@decibel.org> writes: > On Jul 25, 2008, at 11:44 AM, Alvaro Herrera wrote: >> However, it would be neat if this behaved the same as >> >> alvherre=# select '0.42' + 1; >> ERROR: invalid input syntax for integer: "0.42" >> STATEMENT: select '0.42' + 1; > > > Do we really want to be making it easier for people to wrap numbers in quotes? Currently wrapping numbers in quotes is really the way Postgres expects to get them. Quoted constants in Postgres are just "unknown" type which are parsed according to context. Unquoted constants in Postgres are assigned a type by the parser based on what they look like and then have to be cast to something else if that turns out to be the wrong data type. I think people from other languages expect quoted constants to be strings and they often go out of their way to avoid them (causing themselves headaches when they then need to deal with casting rules). -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about EnterpriseDB'sPostgreSQL training!
Gregory Stark <stark@enterprisedb.com> writes: > "Decibel!" <decibel@decibel.org> writes: >> Do we really want to be making it easier for people to wrap numbers in quotes? > Currently wrapping numbers in quotes is really the way Postgres expects to get > them. Really? regression=# select '2' + '2'; ERROR: operator is not unique: unknown + unknown LINE 1: select '2' + '2'; ^ HINT: Could not choose a best candidate operator. You might need to add explicit type casts. We've worked really hard to get to a point where interpretation of literals mostly behaves unsurprisingly. I think most people would find it surprising to think that quoting a number is the preferred way to represent it. regards, tom lane