Bootstrap DATA is a pita - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Bootstrap DATA is a pita |
Date | |
Msg-id | 20150220234142.GH12653@awork2.anarazel.de Whole thread Raw |
Responses |
Re: Bootstrap DATA is a pita
|
List | pgsql-hackers |
Hi, I've been for a long while been rather annoyed about how cumbersome it is to add catalog rows using the bootstrap format. Especially pg_proc.h, pg_operator.h, pg_amop.h, pg_amproc.h and some more are really unwieldy. I think this needs to be improved. And while I'm not going to start working on it tonight, I do plan to work on it if we can agree on a design that I think is worth implementing. The things that bug me most are: 1) When adding new rows it's rather hard to kno which columns are which, and you have to specify a lot you really don'tcare about. Especially in pg_proc that's rather annoying. 2) Having to assign oids for many things that don't actually need is bothersome and greatly increases the likelihood ofconflicts. There's some rows for which we need fixed oids (pg_type ones for example), but e.g. for the majority of pg_procit's unnecessary. 3) Adding a new column to a system catalog, especially pg_proc.h, basically requires writing a complex regex or programto modify the header. Therefore I propose that we add another format to generate the .bki insert lines. What I think we should do is to add pg_<catalog>.data files that contain the actual data that are automatically parsed by Catalog.pm. Those contain the rows in some to-be-decided format. I was considering using json, but it turns out only perl 5.14 started shipping JSON::PP as part of the standard library. So I guess it's best we just make it a big perl array + hashes. To address 1) we just need to make each row a hash and allow leaving out columns that have some default value. 2) is a bit more complex. Generally many rows don't need a fixed oid at all and many others primarily need it to handle object descriptions. The latter seems best best solved by not making it dependant on the oid anymore. 3) Seems primarily solved by not requiring default values to be specified anymore. Also it should be much easier to add new values automatically to a parseable format. I think we'll need to generate oid #defines for some catalog contents, but that seems solveable. Maybe something rougly like: # pg_type.data CatalogData( 'pg_type', [ { oid => 2249, data => {typname => 'cstring', typlen => -2, typbyval => 1, fake => '...'},oiddefine => 'CSTRINGOID' } ] ); # pg_proc.data CatalogData( 'pg_proc', [ { oid => 1242, data => {proname => 'boolin', proretttype => 16, proargtypes => [2275], provolatile=> 'i'}, description => 'I/O', }, { data => {proname => 'mode_final', proretttype => 2283, proargtypes =>[2281, 2283]}, description => 'aggregate final function', } ] ); There'd need to be some logic to assign default values for columns, and maybe even simple logic e.g. to determine arguments like pronargs based on proargtypes. This is far from fully though through, but I think something very roughly along these lines could be a remarkable improvement in the ease of adding new catalog contents. Comments? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: