Re: Database/Table Design for Global Country Statistics - Mailing list pgsql-general

From Richard Huxton
Subject Re: Database/Table Design for Global Country Statistics
Date
Msg-id 46E7A91B.5070706@archonet.com
Whole thread Raw
In response to Database/Table Design for Global Country Statistics  (Stefan Schwarzer <stefan.schwarzer@grid.unep.ch>)
List pgsql-general
Stefan Schwarzer wrote:
> Hi there,
>
> I learned in another posting that my table design - in a polite way -
> "could be improved".
>
> So, before doing any additional design errors, I would like to get
> feedback, if possible.
>
> I am dealing with some 500 tables for worldwide national statistics
> (GDP, population, environment etc.), covering approx. 30 years each. For
> each of these variables, I usually have as well (pre-prepared)
> subregional and regional aggregations too. These could - and should - at
> the end be calculated on-the-fly, and not pre-calculated and imported
> from Excel as it is for the moment.

Might want a trigger to keep them up to date.

> My (national) table for a given variable is in the moment as follows (id
> being the identifier for a specific country):
>
> id   |    1970    |    1971    |    ...    |    2004    |    2005
> -------------------------------------------------------------------
>  1   |   NULL    |      36       |   ...     |      42      |      45
>  2 ......

Ick - fiddly if you we ever reach 2006...

> The new design would be like this:
>
> id   |    year    |    value
> -------------------------------
>  1   |   1970    |     NULL
>  1   |   1971    |      36
>  1   ....
>  1   |   2005    |      45
>  2   |   1970    |      ....
>  2   .....
>
>
> Would that be considered as "good table design" then?

Well if "id" is a country I'd rename it "country_id".

If all years+countries should have the same set of measurements you
might want it in one table:
  (country_id, year, population, area, roads_in_km, ...)

However, if some measurements aren't relevant you're probably better off
with separate table for each measurement: country_population,
country_area etc.

The issue with NULLs is what do they mean. They should mean "unknown",
nothing more and nothing less. However, frequently you want to
distinguish between "no figure available" and "not applicable" (e.g.
"monarch" isn't relevant except in monarchies).

In this case it's probably best practice to separate out the values:
  (country_id integer, year integer, has_monarch boolean, monarch text)
This way you can distinguish between has_monarch=false and monarch=''

--
   Richard Huxton
   Archonet Ltd

pgsql-general by date:

Previous
From: Richard Huxton
Date:
Subject: Re: Question to schema public
Next
From: Tino Wildenhain
Date:
Subject: Re: Database/Table Design for Global Country Statistics