Review: Extensions Patch - Mailing list pgsql-hackers
From | David E. Wheeler |
---|---|
Subject | Review: Extensions Patch |
Date | |
Msg-id | 7D51D3B5-F89D-46ED-9D0E-859B2CABC9AD@kineticode.com Whole thread Raw |
Responses |
Re: Review: Extensions Patch
|
List | pgsql-hackers |
Extensions Patch v15 Review =========================== Submission review ----------------- * Is the patch in context diff format? Yes. * Does it apply cleanly to the current git master? Yes. * Does it include reasonable tests, necessary doc patches, etc? `make check` passes. `make installcheck` fails (regression.diffs attached). `make -C contrib installcheck` passes I see tests for pg_execute_sql_string() and replace(), but absolutely nothing else. Such an important core feature reallyshould have a pretty comprehensive suite of tests exercising all the functionality. Relying on the contrib extensiontests won't do it, as they exercise only a subset of the functionality (e.g., no custom_variable_classes), and thenonly as a side-effect of their actual purpose. Details on docs below. Usability review ---------------- Read what the patch is supposed to do, and consider: * Does the patch actually implement that? Yes. * Do we want that? OH YES! * Do we already have it? Nope. * Does it follow SQL spec, or the community-agreed behavior? It's an implementation of community-agreed behavior, as hashed out on the mail list over the years. * Does it include pg_dump support (if applicable)? Yes, it does. * Are there dangers? Yes. Much of the execution is superuser-only, so we need to make sure that such is actually the case (unit tests would helpwith that). * Have all the bases been covered? Mostly, yes. The lack of tests is the single biggest drawback to this patch. Feature test ------------ Apply the patch, compile it and test: * Does the feature work as advertised? Yes. * Are there corner cases the author has failed to consider? I had some trouble getting a third-party extension to work. See the end of this document for details. * Are there any assertion failures or crashes? No. Performance review ------------------ * Does the patch slow down simple tests? Not that I've noticed. * If it claims to improve performance, does it? N/A. * Does it slow down other things? Not that I've noticed. Coding review ------------- Read the changes to the code in detail and consider: * Does it follow the project [http://developer.postgresql.org/pgdocs/postgres/source.html coding guidelines]? I'm not a C expert, but it looks like it matches quite closely with all the other code. It's very neat and well-commented. * Are there portability issues? * Will it work on Windows/BSD etc? I can't comment on these. * Are the comments sufficient and accurate? Yes. * Does it do what it says, correctly? I can't comment on this by looking at the code; see detailed review from a user's perspective below. * Does it produce compiler warnings? No. * Can you make it crash? No. Architecture review ------------------- * Is everything done in a way that fits together coherently with other features/modules? * Are there interdependencies that can cause problems? Can't really comment on this. Notes on the documentation -------------------------- The pg_extension catalog docs are a little thin, but probably sufficient. They appear to be duplicated, though, with theentries for catalog-pg-extension and view-pg-extensions looking identical. One is apparently a list of installed extensions,and the other an SRF that lists installed and available extensions. But I find the duplication a bit confusing.(Might be easer if I wasn't reading SGML though.) The extend-extension docs I don't understand this paragraph at all: <para> The way to put together a coherent set of custom <acronym>SQL</> objects is to create an <literal>extension</literal>.There are two sides of the extension, the Operating System side contains several files and the <acronym>SQL</> one uses them. </para> This paragraph isn't very clear, either: <para> The control file can also contain a <literal>custom_variable_classes</literal> key followed by any number of custom settings, named <literal><replaceable class="parameter">class</replaceable>.varname</literal>. The <literal>custom_variable_classes</literal> value takeseffect at <command>CREATE EXTENSION</command> time, and is persistent. The custom settings are set automaticallytoo, but are not persistent. Examples might help. Control file keys: * comment -- Should mention that it's used for CREATE COMMENT ON EXTENSION, no? * version -- If it's free-form, how will you do dependency-checking? Or will you? * script * encoding -- Seems unnecessary; one can explicitly set it in the .sql file, no? * custom_variable_classes I don't understand this paragraph: <para> The admin function <link linkend="functions-extension">pg_extension_flag_dump</link> can be used torevert the default <literal>pg_dump</literal> policy about objects that belong to an extension and force the flaggedobjects to be part of the backups. </para> What's a pg_dump policy? * There appears to be an irrelevant change to the docs for replace(). * For the pg_read_binary_file() docs, should there not be some sort of note about permissions to the file to be read in? With pg_execute_sql_string() and friends, what if there aren't the same number of parameters as there are placeholders? Shouldbe a note about that. Also, why are the placholders different than everywhere else? I see + SELECT pg_execute_sql_string('CREATE SCHEMA @schema@;', '@schema@', 'utils'); and + SELECT pg_execute_sql_string('CREATE SCHEMA s;', 's', 'bar'); Why not $1 like is used in PREPARE and user-defined functions? I think the second example is particularly weird, as there'snothing to indicate that "s" is a placeholder. I don't mind "@schema@", but would like to see uses of such thingsbe consistent throughout PostgreSQL (and I expect that one cannot add such as placeholders to PREPARE). In the CREATE EXTENSION docs, I don't understand this section: <varlistentry> <term>NO USER DATA</term> <listitem> <para> This option is used by <xref linkend="APP-PGDUMP">.As it's possible to flag extension's objects so that they get dumped (see <xref linkend="functions-extension">),this option allow extension authors to prepare installation scripts that will avoid creating user data when restoring. </para> </listitem> </varlistentry> Does it mean that no user data will be dumped when the extension is dumped? In the DROP EXTENSION docs, I think there's a pasto: <refname>DROP EXTENSION</refname> <refpurpose>remove a tablespace</refpurpose> Another one here: <varlistentry> <term><literal>CASCADE</literal></term> <listitem> <para> Automatically drop objects thatdepend on the index. </para> </listitem> </varlistentry> s/index/extension/. Same in the RESTRICT section. In xfunc.sgml, I see: <term><varname>EXTENSION</varname></term> <listitem> <para> extension control files to install into <literal><replaceable>prefix</replaceable>/share/$MODULEDIR</literal>, derived from the <literal><replaceable>extension</replaceable>.control.in</literal> file to add in your sources. <literal>EXTENSION</literal>can also be a list, you then have to provide a <literal>.control.in</literal> fileper extension in this list. Why would multiple control files be necessary? And how would it map to EXTVERSION? Would all the extensions get the one versionin EXTVERSION? In func.sgml: <literal><function>pg_extension_flag_dump(<parameter>objid</> <type>oid</>)</function></literal> </entry> <entry><type>bool</type></entry> <entry>Flag an extension's <literal>objectid</literal> as worthy of <literal>pg_dump</literal>.</entry> Why is this necessary? Aren't all database objects "worthy" of being dumped? And this: <entry> <literal><function>pg_extension_with_user_data()</function></literal> </entry> <entry><type>bool</type></entry> <entry>Return <literal>true</literal> only when the extension's script should carefor user data.</entry> What constitutes "user data" when it comes to extensions? In contrast, the explanation below of "pg_extension_flag_dump"is very clear, making it obvious what it's for and why it might be useful. I don't see how it relatesto the summary above, however. OTOH, it also says: <para> This function can only be called when loading a <acronym>SQL</> script using the command <command>CREATE EXTENSION</command>.See <xref linkend="sql-createextension"> and <xref linkend="extend-extension">. </para> And I have no idea how I'd call that function within another function call. Does this mean that only the extension authorcan call it? Seems like a weird scoping without precedent. Hrm. Looking at extension.c, I see: + if (!create_extension) + ereport(ERROR, + (errmsg("This function can only be used from CREATE EXTENSION."))); I think that's a confusing error message, frankly. Perhaps better would be something like "This function can only be calledfrom SQL files executed by CREATE EXTENSION". The docs should make that clearer, too. It was only when I saw this errormessage that I realized how it's scoped. Now, some of the information in the [wiki page](http://wiki.postgresql.org/wiki/Extensions) actually seems more informative.For example, the description and examples of \dx, \dx+, and pg_extension_objects() are *much* better. Any wayto work those into the docs? Maybe they don't go into the function reference pages, but perhaps into the "Putting it alltogether" doc? In fact, I think the wiki doc should be copied into the "putting it all together" doc almost verbatim (modulothe patch excerpts). That will be a big help to prospective extension authors. Speaking of pg_extension_objects(), it'd be nice if its output identified the extension to which each object belongs. Oh,I see, you can ask for only the objects in a single extension. That's okay then. But I don't think you need to documentall the OUT paramters, just the required single argument specifying the extension name. Reviewing the wiki page some more, you have this example (also in the docs, IIRC): + DO $$ + BEGIN + IF pg_extension_with_user_data() THEN + create schema foo; + create table foo.bar(id serial primary key); + perform pg_extension_flag_dump('foo.bar_id_seq'::regclass); + perform pg_extension_flag_dump('foo.bar::regclass); + END IF; + END; + $$; That's good, but what if someone has restored a database that has those objects already? Will the extension be re-createdwith this code before the tables are loaded? Or vice versa? Either way, you're going to have a conflict when yourestore from a dump. Code Review ----------- I find the use of variables in the control files confusing. It looks like this: version = 'EXTVERSION' Why not use make syntax? version = $(EXTVERSION) That makes it much clearer that it's a variable and provides the same syntax a the extension developer will already be usingin the Makefile. I've already mentioned that the @extschema@ placeholder is different from any other SQL placeholder: SET search_path = @extschema@; I'm not sure of a better option, but I do think it should be made consistent, if at all possible, since this is executedin the database, not at `make` time. Speaking of which, what happens if an extension has no search_path? Or an old explicit one? SET search_path = public; Hrm. I'm wondering if it might make more sense to ignore such statements when sometning is executed by `pg_execute_sql_file()`?IOW, if I do CREATE EXTENSION foo WITH SCHEMA bar; That the schema would be installed in bar no matter what search_path is set in the file loaded from the file system. I thinkthat this would make CREATE EXTENSION behave more as expected, *and* allow existing extension distributions to justwork (modulo the control file). No need for placeholders in the .sql file anymore, either. Maybe there's some optointo CREATE EXTENSION that would *not* disable `SET search_path`. Speaking of the contol file, I think that its inclusion in extension distributions should be completely optional. Of thekeys supported: * comment * version * script * encoding * custom_variable_classes All except the last one could be written into the Makefile, and then `make` would generate the control file for installation.Only if the extension author wants to support custom_variable_classes would you need to create an explicit controlfile. And even that could probably be finessed with some sort of Make variable. I'd like to see the control file be something that the extension author doesn't have to deal with, as much as possible. Itjust feels superfluous. BTW, you might want to put the changes to the existing contrib modules in a separate patch, just to simplify things a bit.Not a big deal at this point, but it might help the committer to have further separation. But reviewing that bit, I waswondering, how are the uninstall scripts handled? Are they necessary anymore? ### Questions ### * I assume that if an extension is installed that includes a DSO that the server will need to be HUPed before you can CREATEEXTENSION, yes? * I trust that CREATE EXTENSION and DROP EXTENSION and ALTER EXTENSION are transactional, yes? * Is there a regtype for extensions? * Why can only super users get a list of extensions? It might be useful for me to be able to check from within a udf to seeif a given extension is installed, for example. I can see why you wouldn't want to allow listing of extensions that areinstalled on the system but not in the current database. * Is there no way to keep a list of available extensions somewhere in the cluster? Seems kind of silly to parse all the controlfiles from within pg_extensions(). That can make a call to pg_extensions() pretty expensive. * What's with `replace_text_variadic(PG_FUNCTION_ARGS)`? Related? * \dx+ is not documented in \? in psql. I think you just need to add "[+]". * `doc/src/sgml/ref/psql-ref.sgml` needs updating too. * I like that I can see a list of installed and available extensions, but I'm not sure that \dx+ is the right command forthe latter. Usually + means "more columns to provide more information for the same objects as you saw without the +".Is there any precedent for showing different objects with + than without? Exercising the Feature ---------------------- * I built my cluster with all the contrib extensions. \dx+ does a nice job of listing what's available. So I ran `CREATEEXTENSION hstore`. It seemed to work great, but was *very* verbose. Every SQL statement was emitted to the terminal.I thought \echo was turned off by CREATE EXTENSION. I was able to use hstore after that; worked great. Yay! * Wanted to try the schema support, so next did CREATE SCHEMA ex; CREATE EXTENSION citext WITH SCHEMA ex; This was not verbose the way hstore was. And now it's nicely installed: postgres=# \dx List of extensions Schema │ Name │ Version │ Description ────────┼────────┼──────────┼──────────────────────────────────────── ex │ citext │ 9.1devel │ case-insensitive characterstring type public │ hstore │ 9.1devel │ storing sets of key/value pairs (2 rows) Very nice. And I was able to use it with `create table users (nickname ex.citext primary key);` Awesome. * Next I tried dropping it. `DROP EXTENSION ex.citext;` did not work. What if I have citext installed in two schemas andjust want to drop one of them? Ah, I see, one cannot have an extension with the same name in different schemas. Currentlythe schema-qualification is a syntax error, but maybe it should be a little more informative? * Without the schema-qualification I get an error because it's in use. But `CASCADE` works. Great. * The `pg_extension` catalog and `pg_extensions` views are both useful, but too close in name. I found it confusing how similarthey are. It seems to me like the `pg_extension` class is well-named and like the other catalog table names, but `pg_extenions`should perhaps be `pg_available_extensions` or something. Creating a Third Party Extension -------------------------------- I have a very simple extension I wrote more or less as an example extension for PGXN. It's a key/value pair data type called[pair](https://github.com/theory/kv-pair). As an experiment, to see how it would feel to use extensions as an extensiondeveloper, I created [a branch](https://github.com/theory/kv-pair/tree/9.1-extension) to port it. With that change,I gave it a try: export PATH=/usr/local/pgsql-devel/bin:$PATH make sudo make install make installcheck PGUSER=postgres The test failed with this error: ! CREATE EXTENSION pair; ! NOTICE: Installing extension 'pair' from '/usr/local/pgsql-devel/share/contrib/pair.sql ! ERROR: could not execute sql file: '/usr/local/pgsql-devel/share/contrib/pair.sql' The file is there and looks fine. I also just tried to install it myself: > /usr/local/pgsql-devel/bin/psql -U postgres psql (9.1devel) Type "help" for help. postgres=# create extension pair; NOTICE: Installing extension 'pair' from '/usr/local/pgsql-devel/share/contrib/pair.sql',in schema public, with user data ERROR: could not execute sql file: '/usr/local/pgsql-devel/share/contrib/pair.sql' It would be nice if it gave me more information. What failed, exactly? It also fails when I use `WITH SCHEMA public`. I assume this will be a simple issue. Overall I'm happy with how easy it will be to update existing extensions to use thenew functionality, but as I've mentioned above, I do think that it would be better if: * One could specify stuff *only* in the `Makefile` and let `make` create a control file. * I could omit the @extschema@ business altogether and just let CREATE EXTENSION set the path for the duration of its execution. With these two changes, it would be *even easier* for existing third-party extensions to be adapted. Conclusion ---------- Overall I'm very happy with this patch. There are some nits I've brought up here that need to be addressed, and I do thinkthere are some design changes that would be worth considering. So I'm marking it as returned. But I also think thatsomeone more familiar with the core code should do a proper code review during this commitfest (I'm mostly about functionalityand interface). But I think it's close, and I can't wait to have this stuff solidly in core. Best, David
pgsql-hackers by date: