Re: A space-efficient, user-friendly way to store categorical data - Mailing list pgsql-hackers

From Mark Dilger
Subject Re: A space-efficient, user-friendly way to store categorical data
Date
Msg-id A838A06F-5481-49BF-A4F1-23A17F7779DB@gmail.com
Whole thread Raw
In response to Re: A space-efficient, user-friendly way to store categorical data  (Andrew Kane <andrew@chartkick.com>)
List pgsql-hackers
> On Feb 12, 2018, at 5:08 PM, Andrew Kane <andrew@chartkick.com> wrote:
>
> Thanks everyone for the feedback. The current enum implementation requires you to create a new type and add labels
outsidea transaction prior to an insert. 
>
> -- on table creation
> CREATE TYPE city AS ENUM ();
> CREATE TABLE "users" ("city" city);
>
> -- on insert
> ALTER TYPE city ADD VALUE IF NOT EXISTS 'Chicago';
> BEGIN;
> INSERT INTO "users" ("city") VALUES ('Chicago');
> COMMIT;
>
> What would be ideal:
>
> -- on table creation
> CREATE TABLE "users" ("city" dynamic_enum);
>
> -- on insert
> BEGIN;
> INSERT INTO "users" ("city") VALUES ('Chicago');
> COMMIT;
>
> Since enums have a fixed number of labels, this type of feature may be better off as a property you could add to text
columns(as Thomas mentions). This would avoid issues with hitting the max number of labels. 


In your proposed feature, what happens if I create two tables:

CREATE TABLE myusers (city dynamic_enum);
CREATE TABLE yourusers (city dynamic_enum);

Do you imagine that myusers and yourusers are referring to the
same enum or to two different enums?  Are the enums stored in
a new table within pg_catalog, or are they stored in something akin
to a toast table?  If you insert billions of rows into a table, but only
have 30 distinct values, can you quickly query for all 30 distinct enum
values, or would you have to walk billions of rows to find them all?

mark




pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Typo in origin.c
Next
From: Mark Dilger
Date:
Subject: Re: A space-efficient, user-friendly way to store categorical data