Home > mailing lists
Review: Extensions Patch - Mailing list pgsql-hackers

From	David E. Wheeler
Subject	Review: Extensions Patch
Date	December 3, 2010 21:09:34
Msg-id	7D51D3B5-F89D-46ED-9D0E-859B2CABC9AD@kineticode.com Whole thread
Responses	Re: Review: Extensions Patch
List	pgsql-hackers
Tree view
Extensions Patch v15 Review
===========================

Submission review
-----------------

* Is the patch in context diff format?

Yes.

* Does it apply cleanly to the current git master?

Yes.

* Does it include reasonable tests, necessary doc patches, etc?

`make check` passes.
`make installcheck` fails (regression.diffs attached).
`make -C contrib installcheck` passes

I see tests for pg_execute_sql_string() and replace(), but absolutely nothing else. Such an important core feature
reallyshould have a pretty comprehensive suite of tests exercising all the functionality. Relying on the contrib
extensiontests won't do it, as they exercise only a subset of the functionality (e.g., no custom_variable_classes), and
thenonly as a side-effect of their actual purpose. 

Details on docs below.

Usability review
----------------

Read what the patch is supposed to do, and consider:

* Does the patch actually implement that?

Yes.

* Do we want that?

OH YES!

* Do we already have it?

Nope.

* Does it follow SQL spec, or the community-agreed behavior?

It's an implementation of community-agreed behavior, as hashed out on the mail list over the years.

* Does it include pg_dump support (if applicable)?

Yes, it does.

* Are there dangers?

Yes. Much of the execution is superuser-only, so we need to make sure that such is actually the case (unit tests would
helpwith that). 

* Have all the bases been covered?

Mostly, yes. The lack of tests is the single biggest drawback to this patch.

Feature test
------------

Apply the patch, compile it and test:

* Does the feature work as advertised?

Yes.

* Are there corner cases the author has failed to consider?

I had some trouble getting a third-party extension to work. See the end of this document for details.

* Are there any assertion failures or crashes?

No.

Performance review
------------------

* Does the patch slow down simple tests?

Not that I've noticed.

* If it claims to improve performance, does it?

N/A.

* Does it slow down other things?

Not that I've noticed.

Coding review
-------------

Read the changes to the code in detail and consider:

* Does it follow the project [http://developer.postgresql.org/pgdocs/postgres/source.html coding guidelines]?

I'm not a C expert, but it looks like it matches quite closely with all the other code. It's very neat and
well-commented.

* Are there portability issues?
* Will it work on Windows/BSD etc?

I can't comment on these.

* Are the comments sufficient and accurate?

Yes.

* Does it do what it says, correctly?

I can't comment on this by looking at the code; see detailed review from a user's perspective below.

* Does it produce compiler warnings?

No.

* Can you make it crash?

No.

Architecture review
-------------------

* Is everything done in a way that fits together coherently with other features/modules?
* Are there interdependencies that can cause problems?

Can't really comment on this.

Notes on the documentation
--------------------------

The pg_extension catalog docs are a little thin, but probably sufficient. They appear to be duplicated, though, with
theentries for catalog-pg-extension and view-pg-extensions looking identical. One is apparently a list of installed
extensions,and the other an SRF that lists installed and available extensions. But I find the duplication a bit
confusing.(Might be easer if I wasn't reading SGML though.) 

The extend-extension docs

I don't understand this paragraph at all:
   <para>    The way to put together a coherent set of custom <acronym>SQL</> objects    is to create an
<literal>extension</literal>.There are two sides of the    extension, the Operating System side contains several files
and   the <acronym>SQL</> one uses them.   </para> 

This paragraph isn't very clear, either:
      <para>       The control file can also contain       a <literal>custom_variable_classes</literal> key followed by
any      number of custom settings,       named <literal><replaceable
class="parameter">class</replaceable>.varname</literal>.      The <literal>custom_variable_classes</literal> value
takeseffect       at <command>CREATE EXTENSION</command> time, and is persistent. The       custom settings are set
automaticallytoo, but are not persistent. 

Examples might help.


Control file keys:

* comment -- Should mention that it's used for CREATE COMMENT ON EXTENSION, no?
* version -- If it's free-form, how will you do dependency-checking? Or will you?
* script
* encoding -- Seems unnecessary; one can explicitly set it in the .sql file, no?
* custom_variable_classes

I don't understand this paragraph:
    <para>     The admin     function <link linkend="functions-extension">pg_extension_flag_dump</link>     can be used
torevert the default <literal>pg_dump</literal> policy     about objects that belong to an extension and force the
flaggedobjects     to be part of the backups.    </para> 

What's a pg_dump policy?

* There appears to be an irrelevant change to the docs for replace().
* For the pg_read_binary_file() docs, should there not be some sort of note about permissions to the file to be read
in?

With pg_execute_sql_string() and friends, what if there aren't the same number of parameters as there are placeholders?
Shouldbe a note about that. 

Also, why are the placholders different than everywhere else? I see

+ SELECT pg_execute_sql_string('CREATE SCHEMA @schema@;', '@schema@', 'utils');

and

+ SELECT pg_execute_sql_string('CREATE SCHEMA s;', 's', 'bar');

Why not $1 like is used in PREPARE and user-defined functions? I think the second example is particularly weird, as
there'snothing to indicate that "s" is a placeholder. I don't mind "@schema@", but would like to see uses of such
thingsbe consistent throughout PostgreSQL (and I expect that one cannot add such as placeholders to PREPARE). 

In the CREATE EXTENSION docs, I don't understand this section:
     <varlistentry>      <term>NO USER DATA</term>      <listitem>       <para>        This option is used by <xref
linkend="APP-PGDUMP">.As it's possible        to flag extension's objects so that they get dumped        (see <xref
linkend="functions-extension">),this option allow        extension authors to prepare installation scripts that will
avoid       creating user data when restoring.       </para>      </listitem>     </varlistentry> 

Does it mean that no user data will be dumped when the extension is dumped?

In the DROP EXTENSION docs, I think there's a pasto:
  <refname>DROP EXTENSION</refname>  <refpurpose>remove a tablespace</refpurpose>

Another one here:
   <varlistentry>    <term><literal>CASCADE</literal></term>    <listitem>     <para>      Automatically drop objects
thatdepend on the index.     </para>    </listitem>   </varlistentry> 

s/index/extension/. Same in the RESTRICT section.

In xfunc.sgml, I see:
      <term><varname>EXTENSION</varname></term>      <listitem>       <para>        extension control files to install
     into <literal><replaceable>prefix</replaceable>/share/$MODULEDIR</literal>,        derived from        the
<literal><replaceable>extension</replaceable>.control.in</literal>       file to add in your sources.
<literal>EXTENSION</literal>can also        be a list, you then have to provide a <literal>.control.in</literal>
fileper extension in this list. 

Why would multiple control files be necessary? And how would it map to EXTVERSION? Would all the extensions get the one
versionin EXTVERSION? 

In func.sgml:
       <literal><function>pg_extension_flag_dump(<parameter>objid</> <type>oid</>)</function></literal>      </entry>
  <entry><type>bool</type></entry>      <entry>Flag an extension's <literal>objectid</literal> as worthy      of
<literal>pg_dump</literal>.</entry>

Why is this necessary? Aren't all database objects "worthy" of being dumped? And this:
      <entry>       <literal><function>pg_extension_with_user_data()</function></literal>      </entry>
<entry><type>bool</type></entry>     <entry>Return <literal>true</literal> only when the extension's      script should
carefor user data.</entry> 

What constitutes "user data" when it comes to extensions? In contrast, the explanation below of
"pg_extension_flag_dump"is very clear, making it obvious what it's for and why it might be useful. I don't see how it
relatesto the summary above, however. OTOH, it also says: 
  <para>   This function can only be called when loading a <acronym>SQL</> script   using the command <command>CREATE
EXTENSION</command>.See <xref linkend="sql-createextension">   and <xref linkend="extend-extension">.  </para> 

And I have no idea how I'd call that function within another function call. Does this mean that only the extension
authorcan call it? Seems like a weird scoping without precedent. 

Hrm. Looking at extension.c, I see:

+     if (!create_extension)
+         ereport(ERROR,
+                 (errmsg("This function can only be used from CREATE EXTENSION.")));

I think that's a confusing error message, frankly. Perhaps better would be something like "This function can only be
calledfrom SQL files executed by CREATE EXTENSION". The docs should make that clearer, too. It was only when I saw this
errormessage that I realized how it's scoped. 

Now, some of the information in the [wiki page](http://wiki.postgresql.org/wiki/Extensions) actually seems more
informative.For example, the description and examples of \dx, \dx+, and pg_extension_objects() are *much* better. Any
wayto work those into the docs? Maybe they don't go into the function reference pages, but perhaps into the "Putting it
alltogether" doc? In fact, I think the wiki doc should be copied into the "putting it all together" doc almost verbatim
(modulothe patch excerpts). That will be a big help to prospective extension authors. 

Speaking of pg_extension_objects(), it'd be nice if its output identified the extension to which each object belongs.
Oh,I see, you can ask for only the objects in a single extension. That's okay then. But I don't think you need to
documentall the OUT paramters, just the required single argument specifying the extension name. 

Reviewing the wiki page some more, you have this example (also in the docs, IIRC):

+ DO $$
+ BEGIN
+ IF pg_extension_with_user_data() THEN
+   create schema foo;
+   create table foo.bar(id serial primary key);
+   perform pg_extension_flag_dump('foo.bar_id_seq'::regclass);
+   perform pg_extension_flag_dump('foo.bar::regclass);
+ END IF;
+ END;
+ $$;

That's good, but what if someone has restored a database that has those objects already? Will the extension be
re-createdwith this code before the tables are loaded? Or vice versa? Either way, you're going to have a conflict when
yourestore from a dump. 

Code Review
-----------

I find the use of variables in the control files confusing. It looks like this:
   version = 'EXTVERSION'

Why not use make syntax?
   version = $(EXTVERSION)

That makes it much clearer that it's a variable and provides the same syntax a the extension developer will already be
usingin the Makefile. 

I've already mentioned that the @extschema@ placeholder is different from any other SQL placeholder:
   SET search_path = @extschema@;

I'm not sure of a better option, but I do think it should be made consistent, if at all possible, since this is
executedin the database, not at `make` time. 

Speaking of which, what happens if an extension has no search_path? Or an old explicit one?
   SET search_path = public;

Hrm. I'm wondering if it might make more sense to ignore such statements when sometning is executed by
`pg_execute_sql_file()`?IOW, if I do 
   CREATE EXTENSION foo WITH SCHEMA bar;

That the schema would be installed in bar no matter what search_path is set in the file loaded from the file system. I
thinkthat this would make CREATE EXTENSION behave more as expected, *and* allow existing extension distributions to
justwork (modulo the control file). No need for placeholders in the .sql file anymore, either. Maybe there's some
optointo CREATE EXTENSION that would *not* disable `SET search_path`. 

Speaking of the contol file, I think that its inclusion in extension distributions should be completely optional. Of
thekeys supported: 

* comment
* version
* script
* encoding
* custom_variable_classes

All except the last one could be written into the Makefile, and then `make` would generate the control file for
installation.Only if the extension author wants to support custom_variable_classes would you need to create an explicit
controlfile. And even that could probably be finessed with some sort of Make variable. 

I'd like to see the control file be something that the extension author doesn't have to deal with, as much as possible.
Itjust feels superfluous. 

BTW, you might want to put the changes to the existing contrib modules in a separate patch, just to simplify things a
bit.Not a big deal at this point, but it might help the committer to have further separation. But reviewing that bit, I
waswondering, how are the uninstall scripts handled? Are they necessary anymore? 

### Questions ###

* I assume that if an extension is installed that includes a DSO that the server will need to be HUPed before you can
CREATEEXTENSION, yes? 

* I trust that CREATE EXTENSION and DROP EXTENSION and ALTER EXTENSION are transactional, yes?

* Is there a regtype for extensions?

* Why can only super users get a list of extensions? It might be useful for me to be able to check from within a udf to
seeif a given extension is installed, for example. I can see why you wouldn't want to allow listing of extensions that
areinstalled on the system but not in the current database. 

* Is there no way to keep a list of available extensions somewhere in the cluster? Seems kind of silly to parse all the
controlfiles from within pg_extensions(). That can make a call to pg_extensions() pretty expensive. 

* What's with `replace_text_variadic(PG_FUNCTION_ARGS)`? Related?

* \dx+ is not documented in \? in psql. I think you just need to add "[+]".

* `doc/src/sgml/ref/psql-ref.sgml` needs updating too.

* I like that I can see a list of installed and available extensions, but I'm not sure that \dx+ is the right command
forthe latter. Usually + means "more columns to provide more information for the same objects as you saw without the
+".Is there any precedent for showing different objects with + than without? 

Exercising the Feature
----------------------

* I built my cluster with all the contrib extensions. \dx+ does a nice job of listing what's available. So I ran
`CREATEEXTENSION hstore`. It seemed to work great, but was *very* verbose. Every SQL statement was emitted to the
terminal.I thought \echo was turned off by CREATE EXTENSION. I was able to use hstore after that; worked great. Yay! 

* Wanted to try the schema support, so next did
     CREATE SCHEMA ex;     CREATE EXTENSION citext WITH SCHEMA ex;
 This was not verbose the way hstore was. And now it's nicely installed:     postgres=# \dx
List of extensions      Schema │  Name  │ Version  │              Description
────────┼────────┼──────────┼────────────────────────────────────────     ex     │ citext │ 9.1devel │ case-insensitive
characterstring type      public │ hstore │ 9.1devel │ storing sets of key/value pairs     (2 rows) 
 Very nice. And I was able to use it with `create table users (nickname ex.citext primary key);` Awesome.

* Next I tried dropping it. `DROP EXTENSION ex.citext;` did not work. What if I have citext installed in two schemas
andjust want to drop one of them? Ah, I see, one cannot have an extension with the same name in different schemas.
Currentlythe schema-qualification is a syntax error, but maybe it should be a little more informative? 

* Without the schema-qualification I get an error because it's in use. But `CASCADE`  works. Great.

* The `pg_extension` catalog and `pg_extensions` views are both useful, but too close in name. I found it confusing how
similarthey are. It seems to me like the `pg_extension` class is well-named and like the other catalog table names, but
`pg_extenions`should perhaps be `pg_available_extensions` or something. 

Creating a Third Party Extension
--------------------------------

I have a very simple extension I wrote more or less as an example extension for PGXN. It's a key/value pair data type
called[pair](https://github.com/theory/kv-pair). As an experiment, to see how it would feel to use extensions as an
extensiondeveloper, I created [a branch](https://github.com/theory/kv-pair/tree/9.1-extension) to port it. With that
change,I gave it a try: 
   export PATH=/usr/local/pgsql-devel/bin:$PATH   make   sudo make install   make installcheck PGUSER=postgres

The test failed with this error:
   ! CREATE EXTENSION pair;   ! NOTICE:  Installing extension 'pair' from
'/usr/local/pgsql-devel/share/contrib/pair.sql  ! ERROR:  could not execute sql file:
'/usr/local/pgsql-devel/share/contrib/pair.sql'

The file is there and looks fine. I also just tried to install it myself:
    > /usr/local/pgsql-devel/bin/psql -U postgres   psql (9.1devel)   Type "help" for help.
   postgres=# create extension pair;   NOTICE:  Installing extension 'pair' from
'/usr/local/pgsql-devel/share/contrib/pair.sql',in schema public, with user data   ERROR:  could not execute sql file:
'/usr/local/pgsql-devel/share/contrib/pair.sql'

It would be nice if it gave me more information. What failed, exactly? It also fails when I use `WITH SCHEMA public`.

I assume this will be a simple issue. Overall I'm happy with how easy it will be to update existing extensions to use
thenew functionality, but as I've mentioned above, I do think that it would be better if: 

* One could specify stuff *only* in the `Makefile` and let `make` create a control file.

* I could omit the @extschema@ business altogether and just let CREATE EXTENSION set the path for the duration of its
execution.

With these two changes, it would be *even easier* for existing third-party extensions to be adapted.

Conclusion
----------

Overall I'm very happy with this patch. There are some nits I've brought up here that need to be addressed, and I do
thinkthere are some design changes that would be worth considering. So I'm marking it as returned. But I also think
thatsomeone more familiar with the core code should do a proper code review during this commitfest (I'm mostly about
functionalityand interface). 

But I think it's close, and I can't wait to have this stuff solidly in core.

Best,

David
pgsql-hackers by date:
From: Itagaki Takahiro
Date: 03 December 2010, 20:44:34
Subject: Re: pg_execute_from_file review
From: Tatsuo Ishii
Date: 03 December 2010, 22:17:56
Subject: Streaming replication document
Review: Extensions Patch - Mailing list pgsql-hackers

Previous

Next