Re: Unsigned integer types - Mailing list pgsql-hackers
From | Maciej Gajewski |
---|---|
Subject | Re: Unsigned integer types |
Date | |
Msg-id | CAEcSYX+Arn7y4FeYPp6ZgbiiiMfZYmsn9aUyotZB-MA1n5hTOw@mail.gmail.com Whole thread Raw |
In response to | Re: Unsigned integer types (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Unsigned integer types
Re: Unsigned integer types Re: Unsigned integer types |
List | pgsql-hackers |
The reasons are: performance, storage and frustration. I think the frustration comes from the fact that unsigned integers are universally available, except in PostgreSQL. I work with a really complex system, with many moving parts, and Postgres really is one of the components that causes the least trouble (compared to other opens-source and closed-sourced systems, which I shall leave unnamed), except for the unsigned integers. Let me give you few examples: 1. SMALLINT Probably the most popular unsigned short int on the planet: IP port number. I had to store some network traffic data in DB; I instinctively started to prototyping it like this: CREATE TABLE packets (addr INET, port SMALLINT, ... ); Of course it failed quickly and I had to bump the size to INTEGER. No real harm here, as the 2 bytes will probably go into some padding anyway, but somehow it feels wrong. 2. INTEGER I had to store a record with several uint32. I had to store an awful lot of them; hundreds GB of data per day. Roughly half of the record consists of uint32 fields. Increasing the data type to bigint would mean that I could store 3 instead of 4 days worth of data on available storage. Continuing with int4 meant that I would have to deal with the data in special way when in enters and leaves the DB. It's easy in C: just cast uint32_t to int32_t. But python code requires more complex changes. And the web backend too... It's suffering either way! Just imagine the conversation I had to have with my boss: "Either we'll increase budged for storage, or we need to touch every bit of the system". 3 .BIGINT There is no escape from bigint. Numeric (or TEXT!) is the only thing that can keep uint64, but when you have 10^9 and more records, and you need to do some arithmetic on it, numeric it's just too slow. We use uint64 all across our system as unique event identifier. It works fine, it's fast, and it's very convenient. Passing uint64 around, storing it, looking it up. We use it everywhere, including UI and log files. So once I decided to use BIGINT to store it, I had to guard all the inputs and outputs and make sure it is handled correctly. Or so I though. It turned out that some guys from different department are parsing some logs with perl parser and they store it in DB. They choose to store the uint64 id as TEXT. They probably tried BIGINT and failed and decided that - since they have low volume and they are not doing any arithmetics - to store it as TEXT. And now someone came up with an idea to join one table with another, bigint with text. I did it. Initially I wrote function that converted the text to numeric, then rotated it around 2^64 if necessary. It was too slow. Too slow for something that should be a simple reinterpretation of data. Eventually I ended up writing a C function, that first scanf( "%llu")'d the text into uint64_t, and then PG_RETURN_INT64-ed the uint64_t value. Works fast, but operations hate for increasing the complexity of DB deployment. --- I know some cynical people that love this kind of problems, they feel that the constant struggle is what keeps them employed :) But I'm ready to use my private time to solve it once and for all. I'm afraid that implementing uints as and extension would introduce some performance penalty (I may be wrong). I'm also afraid that with the extension I'd be left on my own maintaining it forever. While if this could go into the core product, it would live forever. As for the POLA violation: programmers experienced with statically typed languages shouldn't have problems dealing with all the issues surrounding signed/unsigned integers (like the ones described here: http://c-faq.com/expr/preservingrules.html). Others don't need to use them. Maciek On 27 May 2013 16:16, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Maciej Gajewski <maciej.gajewski0@gmail.com> writes: >> The lack of unsigned integer types is one of the biggest sources of >> grief in my daily work with pgsql. >> Before I go and start hacking, I'd like to discuss few points: >> 1. Is there a strong objection against merging this kind of patch? > > Basically, there is zero chance this will happen unless you can find > a way of fitting them into the numeric promotion hierarchy that doesn't > break a lot of existing applications. We have looked at this more than > once, if memory serves, and failed to come up with a workable design > that didn't seem to violate the POLA. > >> 2. How/if should the behaviour of numeric literals change? > >> The minimalistic solution is: it shouldn't, literals should be assumed >> signed by default. More complex solution could involve using C-style >> suffix ('123456u'). > > Well, if you don't do that, there is no need for you to merge anything: > you can build unsigned types as an external extension if they aren't > affecting the core parser's behavior. As long as it's external, you > don't need to satisfy anybody else's idea of what reasonable behavior > is ... > > regards, tom lane
pgsql-hackers by date: