Home > mailing lists

Bootstrap DATA is a pita - Mailing list pgsql-hackers

From	Andres Freund
Subject	Bootstrap DATA is a pita
Date	February 20, 2015 23:41:50
Msg-id	20150220234142.GH12653@awork2.anarazel.de Whole thread Raw
Responses	Re: Bootstrap DATA is a pita
List	pgsql-hackers

Tree view

Hi,

I've been for a long while been rather annoyed about how cumbersome it
is to add catalog rows using the bootstrap format. Especially pg_proc.h,
pg_operator.h, pg_amop.h, pg_amproc.h and some more are really unwieldy.

I think this needs to be improved. And while I'm not going to start
working on it tonight, I do plan to work on it if we can agree on a
design that I think is worth implementing.

The things that bug me most are:

1) When adding new rows it's rather hard to kno which columns are which,  and you have to specify a lot you really
don'tcare about. Especially  in pg_proc that's rather annoying.
 

2) Having to assign oids for many things that don't actually need is  bothersome and greatly increases the likelihood
ofconflicts. There's  some rows for which we need fixed oids (pg_type ones for example),  but e.g. for the majority of
pg_procit's unnecessary.
 

3) Adding a new column to a system catalog, especially pg_proc.h,  basically requires writing a complex regex or
programto modify the  header.
 

Therefore I propose that we add another format to generate the .bki
insert lines.

What I think we should do is to add pg_<catalog>.data files that contain
the actual data that are automatically parsed by Catalog.pm. Those
contain the rows in some to-be-decided format. I was considering using
json, but it turns out only perl 5.14 started shipping JSON::PP as part
of the standard library. So I guess it's best we just make it a big perl
array + hashes.

To address 1) we just need to make each row a hash and allow leaving out
columns that have some default value.

2) is a bit more complex. Generally many rows don't need a fixed oid at
all and many others primarily need it to handle object descriptions. The
latter seems best best solved by not making it dependant on the oid
anymore.

3) Seems primarily solved by not requiring default values to be
specified anymore. Also it should be much easier to add new values
automatically to a parseable format.

I think we'll need to generate oid #defines for some catalog contents,
but that seems solveable.

Maybe something rougly like:

# pg_type.data
CatalogData(   'pg_type',   [    { oid => 2249, data => {typname => 'cstring', typlen => -2, typbyval => 1, fake =>
'...'},oiddefine => 'CSTRINGOID'    }   ]
 
);

# pg_proc.data
CatalogData(   'pg_proc',   [    { oid => 1242, data => {proname => 'boolin', proretttype => 16, proargtypes => [2275],
provolatile=> 'i'}, description => 'I/O',    },    { data => {proname => 'mode_final', proretttype => 2283, proargtypes
=>[2281, 2283]}, description => 'aggregate final function',    }   ]
 
);

There'd need to be some logic to assign default values for columns, and
maybe even simple logic e.g. to determine arguments like pronargs based
on proargtypes.

This is far from fully though through, but I think something very
roughly along these lines could be a remarkable improvement in the ease
of adding new catalog contents.

Comments?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

pgsql-hackers by date:

From: Kevin Grittner
Date: 20 February 2015, 23:21:49
Subject: Re: Idea: GSoC - Query Rewrite with Materialized Views

From: Peter Geoghegan
Date: 20 February 2015, 23:44:24
Subject: Re: INSERT ... ON CONFLICT UPDATE and logical decoding

Bootstrap DATA is a pita - Mailing list pgsql-hackers

Previous

Next