Thread: Questions about PostgreSQL implementation details
Hello PostgreSQL hackers, I hope I am posting on the right mailing-list. I am actually doing a PhD related to relational databases and software engineering. I use PostgreSQL for my research. I have a few questions about the internals of PostgreSQL and I think they require experts knowledge. I could not find documentation about that in the nice PostgreSQL documentation but maybe I missed something? Tell me if itis the case. My Questions: Q1. Are PostgreSQL's meta-description tables (such as pg_class) the "reality" concerning the state of the DB or are theyjust a virtual representation ? What I would like to know with this question is: would it be possible to implement DDL queries (e.g. CREATE TABLE, DROP TABLE,CREATE VIEW, ALTER TABLE, etc.) as DML queries that modify the meta-data stored in meta-description tables? For example, something like: INSERT INTO pg_class [...]; To create a new table (instead of the CREATE TABLE DDL query). Q1.1 If it is possible, is what is done in reality? I have the feeling that it is not the case and that DDL queries are implementedin C directly. Q1.2 If it is possible and not done, what is the reason? -- Q2. Are PostgreSQL's "meta-constraints" (i.e. constraints related to database structure such as "a table can only have asingle primary key") implemented in C code or via data constraints on PostgreSQL's meta-description tables? I guess, again, they are implemented in C code, but I might be wrong. Q2.1 If they are not implemented via data constraints on meta-description tables, why ? Q2.2 Is there somewhere in the documentation a list of such "meta-constraints" implemented by PostgreSQL? Thanks in advance for answering my questions, any help or pointers to existing documentation will be appreciated. Cheers, Julien Delplanque
On 12/9/19 7:35 AM, Julien Delplanque wrote: > Hello PostgreSQL hackers, > > I hope I am posting on the right mailing-list. > > I am actually doing a PhD related to relational databases and software > engineering. > > I use PostgreSQL for my research. > > I have a few questions about the internals of PostgreSQL and I think > they require experts knowledge. > > I could not find documentation about that in the nice PostgreSQL > documentation but maybe I missed something? Tell me if it is the case. > > My Questions: > > Q1. Are PostgreSQL's meta-description tables (such as pg_class) the > "reality" concerning the state of the DB or are they just a virtual > representation ? Not all of them are real tables; some of the pg_catalog relations are views over others of them. But many of them are real tables with C structs that back them. Take a look in src/include/catalog/pg_class.h and you'll see the C struct definition, somewhat obscured by some macros that make it less obvious to people not familiar with the postgresql sources. On line 29: CATALOG(pg_class,1259,RelationRelationId) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83,RelationRelation_Rowtype_Id) BKI_SCHEMA_MACRO { ... } That's a typedef. See genbki.h where it defines the macro: #define CATALOG(name,oid,oidmacro) typedef struct CppConcat(FormData_,name) > What I would like to know with this question is: would it be possible to > implement DDL queries (e.g. CREATE TABLE, DROP TABLE, CREATE VIEW, ALTER > TABLE, etc.) as DML queries that modify the meta-data stored in > meta-description tables? > > For example, something like: > > INSERT INTO pg_class [...]; > > To create a new table (instead of the CREATE TABLE DDL query). You are not allowed to insert into the pg_class table directly. There are good reasons for that. Simply inserting a row into this table would not cause all the infrastructure that backs a table to pop into existence. So you have to use the DDL commands. > > Q1.1 If it is possible, is what is done in reality? I have the feeling > that it is not the case and that DDL queries are implemented in C directly. See src/backend/commands/tablecmds.c, function DefineRelation. -- Mark Dilger
Julien Delplanque <julien.delplanque@inria.fr> writes: > I have a few questions about the internals of PostgreSQL and I think they require experts knowledge. > Q1. Are PostgreSQL's meta-description tables (such as pg_class) the "reality" concerning the state of the DB or are theyjust a virtual representation ? The system catalogs are reality as far as the effects of DDL go. In the particular case of pg_class, there is additional reality, which is that (most) pg_class rows represent one or more data files on-disk. You could in fact simulate many DDL operations by manual inserts/deletes/updates on system catalogs; but that would not result in any actions on the data files, so it falls down for the specific cases of CREATE/DROP TABLE, CREATE/DROP INDEX, etc. > What I would like to know with this question is: would it be possible to implement DDL queries (e.g. CREATE TABLE, DROPTABLE, CREATE VIEW, ALTER TABLE, etc.) as DML queries that modify the meta-data stored in meta-description tables? Underneath, many of those operations are just catalog manipulations, so yes up to the point where you need to do something that impacts user data storage. (In practice, getting all the details right from a SQL client would be a pretty painful thing, so I'm not sure I see the point. Usually, modifying the PG C code or writing an extension would be a saner approach to modifying the system's behavior.) > Q2. Are PostgreSQL's "meta-constraints" (i.e. constraints related to database structure such as "a table can only havea single primary key") implemented in C code or via data constraints on PostgreSQL's meta-description tables? Mostly code. The only real constraints on the system catalogs, in the sense of something that would reject an ill-advised low-level update, are the unique indexes. There are also NOT NULL markers on many of the catalogs' columns, but those are only enforced against manual SQL updates not updates made by C code. This is all somewhat historical, I suppose, but it's worked well enough for us. > Thanks in advance for answering my questions, any help or pointers to existing documentation will be appreciated. There's no substitute for reading the source code. Unlike some other systems, PG was developed as open source from the beginning, so there is not much in the way of a "theory of operations manual" or other separate internals documentation --- people with these sorts of questions are expected to go read the code. You could perhaps start by skimming https://www.postgresql.org/docs/devel/internals.html and then go look for README files in the portions of the source tree that interest you. regards, tom lane
Hi Julien! On 09/12/2019 17:35, Julien Delplanque wrote: > Q1. Are PostgreSQL's meta-description tables (such as pg_class) the "reality" concerning the state of the DB or are theyjust a virtual representation ? Yes, the catalog tables are the authoritative source. The system uses those tables internally to get the information too. Some of the pg_* relations are just views over other catalog tables, though. > What I would like to know with this question is: would it be possible to implement DDL queries (e.g. CREATE TABLE, DROPTABLE, CREATE VIEW, ALTER TABLE, etc.) as DML queries that modify the meta-data stored in meta-description tables? > > For example, something like: > > INSERT INTO pg_class [...]; > > To create a new table (instead of the CREATE TABLE DDL query). > > Q1.1 If it is possible, is what is done in reality? I have the feeling that it is not the case and that DDL queries areimplemented in C directly. > > Q1.2 If it is possible and not done, what is the reason? The C code for the DDL commands do some things in addition to modifying the catalog tables. Notably for CREATE TABLE, it creates the relation file in the data directory, where all the data is stored. It also handles locking, invalidating various caches, firing event triggers etc. Except for creating relation files, those other things happen just in memory, though. It is not supported, and please don't do it in production, but you could try it out. Set "allow_system_table_mods=on", and insert to pg_class, pg_attribute, etc. See how well it works. Beware that there are internal caches, called "syscaches", in backends over the catalog tables, so if you modify them directly, you may need to restart for the changes to take effect. > Q2. Are PostgreSQL's "meta-constraints" (i.e. constraints related to database structure such as "a table can only havea single primary key") implemented in C code or via data constraints on PostgreSQL's meta-description tables? > > I guess, again, they are implemented in C code, but I might be wrong. In C code. > Q2.1 If they are not implemented via data constraints on meta-description tables, why ? I think there are some restrictions that cannot easily be represented as constraints. Also, we've never supported constraints on catalog tables, so no one's given much thought to what it would look like if we did. > Q2.2 Is there somewhere in the documentation a list of such "meta-constraints" implemented by PostgreSQL? Hmm, I don't think there is. Your best bet is to just look at the C code, I'm afraid. - Heikki
On 12/9/19 7:52 AM, Mark Dilger wrote: >> >> Q1.1 If it is possible, is what is done in reality? I have the feeling >> that it is not the case and that DDL queries are implemented in C >> directly. > > See src/backend/commands/tablecmds.c, function DefineRelation. I realize I could be a bit more helpful, here. For a SQL command like "CREATE TABLE", you can first look in src/backend/parser/gram.y for the grammar rule. In this case, searching for CREATE TABLE leads you to a banner comment around line 3132. The rule for CreateStmt creates a node of type CreateStmt. That leads you to a struct of the same name in src/include/nodes/parsenodes.h. You can see all the fields of that struct, and reconcile those against what the code in gram.y is doing to populate those fields. You can then look in src/backend/tcop/utility.c for the T_CreateStmt nodeTag, and you'll find that it appears in a few switch statements. One of those in ProcessUtilitySlow calls DefineRelation along with a bunch of other stuff. That's how you can trace these commands. Let's take DROP TABLE as a second example. This one is harder, because "DROP TABLE" doesn't exist exactly. It's part of a larger grammar production for DropStmt that includes other DROP commands. All the same, you can see there is a rule for DropStmt that creates a node of type DropStmt. Looking in src/include/nodes/parsenodes.h you can see the struct of the same name. Looking in tcop/utility.c you see T_DropStmt is handled in a few switch statements, and seem to run through ProcessUtilitySlow and ExecDropStmt. The function ExecDropStmt has a switch over stmt->removeType, which was populated back in gram.y to the value OBJECT_TABLE. That now serves to select the RemoveRelations function. This sort of analysis is fairly useful for getting from a SQL statement to the code in src/backend/commands/ that implements the guts of that statement. I hope this helps. -- Mark Dilger
Mark Dilger <hornschnorter@gmail.com> writes: > [ useful tips about finding the code that implements a SQL command ] BTW, if it wasn't obvious already, you *really* want to have some kind of tool that easily finds the definition of a particular C symbol. You can fall back on "grep -r" or "git grep", but lots of people use ctags or etags or some other C-aware indexing tool. regards, tom lane
Thanks for your answer, I will dive into the C code then. Le 9/12/19 à 16:52, Mark Dilger a écrit : > > Not all of them are real tables; some of the pg_catalog relations are > views over others of them. But many of them are real tables with C > structs that back them. Take a look in src/include/catalog/pg_class.h > and you'll see the C struct definition, somewhat obscured by some > macros that make it less obvious to people not familiar with the > postgresql sources. Indeed, I should have say it in my email, I saw in the documentation that some of them are views. :-) Cheers, Julien Le 9/12/19 à 16:52, Mark Dilger a écrit : > > > On 12/9/19 7:35 AM, Julien Delplanque wrote: >> Hello PostgreSQL hackers, >> >> I hope I am posting on the right mailing-list. >> >> I am actually doing a PhD related to relational databases and >> software engineering. >> >> I use PostgreSQL for my research. >> >> I have a few questions about the internals of PostgreSQL and I think >> they require experts knowledge. >> >> I could not find documentation about that in the nice PostgreSQL >> documentation but maybe I missed something? Tell me if it is the case. >> >> My Questions: >> >> Q1. Are PostgreSQL's meta-description tables (such as pg_class) the >> "reality" concerning the state of the DB or are they just a virtual >> representation ? > > Not all of them are real tables; some of the pg_catalog relations are > views over others of them. But many of them are real tables with C > structs that back them. Take a look in src/include/catalog/pg_class.h > and you'll see the C struct definition, somewhat obscured by some > macros that make it less obvious to people not familiar with the > postgresql sources. > > On line 29: > > CATALOG(pg_class,1259,RelationRelationId) BKI_BOOTSTRAP > BKI_ROWTYPE_OID(83,RelationRelation_Rowtype_Id) BKI_SCHEMA_MACRO > { > ... > } > > That's a typedef. See genbki.h where it defines the macro: > > #define CATALOG(name,oid,oidmacro) typedef struct > CppConcat(FormData_,name) > >> What I would like to know with this question is: would it be possible >> to implement DDL queries (e.g. CREATE TABLE, DROP TABLE, CREATE VIEW, >> ALTER TABLE, etc.) as DML queries that modify the meta-data stored in >> meta-description tables? >> >> For example, something like: >> >> INSERT INTO pg_class [...]; >> >> To create a new table (instead of the CREATE TABLE DDL query). > > You are not allowed to insert into the pg_class table directly. There > are good reasons for that. Simply inserting a row into this table would > not cause all the infrastructure that backs a table to pop into > existence. So you have to use the DDL commands. > >> >> Q1.1 If it is possible, is what is done in reality? I have the >> feeling that it is not the case and that DDL queries are implemented >> in C directly. > > See src/backend/commands/tablecmds.c, function DefineRelation. > > >
On Tue, 10 Dec 2019 at 01:21, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Mark Dilger <hornschnorter@gmail.com> writes:
> [ useful tips about finding the code that implements a SQL command ]
BTW, if it wasn't obvious already, you *really* want to have some kind
of tool that easily finds the definition of a particular C symbol.
You can fall back on "grep -r" or "git grep", but lots of people use
ctags or etags or some other C-aware indexing tool.
I strongly recommend cscope with editor integration for your preferred editor btw.
On Mon, 9 Dec 2019 at 23:35, Julien Delplanque <julien.delplanque@inria.fr> wrote:
Hello PostgreSQL hackers,
I hope I am posting on the right mailing-list.
I am actually doing a PhD related to relational databases and software engineering.
I use PostgreSQL for my research.
I have a few questions about the internals of PostgreSQL and I think they require experts knowledge.
I could not find documentation about that in the nice PostgreSQL documentation but maybe I missed something? Tell me if it is the case.
There are a bunch of README files in the source tree that concern various innards of PostgreSQL. They're not always referred to by any comments etc, so you have to know they exist. They're usually well worth reading, though it can take a while before you understand enough of PostgreSQL's architecture for them to make sense...
Try
find src/ -name README\*
Q1. Are PostgreSQL's meta-description tables (such as pg_class) the "reality" concerning the state of the DB or are they just a virtual representation ?
That's been largely answered. But I want to point out an important caveat that isn't obvious to new people: The oid of a relation (pg_class.oid) is not the same thing as the pg_class.relfilenode, which is usually the base of the filename of the on-disk storage for the relation. On an idle or new database most relations are created with an equal oid and relfilename, so it's easy to think the oid maps to the on-disk name of a relation, but it doesn't. The relation oid will not change so long as the relation exists, but the relfilenode may change if the table contents are rewritten, etc. Additionally, there are special tables that are "relmapped" such that they don't have a normal relfilenode at all, instead access is indirected via a separate mapping. IIRC that's mainly necessary so we can bootstrap access to the catalog tables that tell us how to read the catalogs.
What I would like to know with this question is: would it be possible to implement DDL queries (e.g. CREATE TABLE, DROP TABLE, CREATE VIEW, ALTER TABLE, etc.) as DML queries that modify the meta-data stored in meta-description tables?
Not really.
PostgreSQL has a caching layer - sycache, relcache, catcache - and invalidation scheme that it relies on. It doesn't execute regular queries on the system catalogs. It also has simplifying rules around how they are updated and accessed. See the logic in genam.c etc. Catalogs may also represent things that aren't just other DB rows - for example, pg_class entries are associated with files on disk for individual database tables.
You can't just insert into pg_class, pg_attribute, etc and expect that to safely create a table. Though it's surprising how much you can get away with by hacking the catalogs if you're very careful and you trick PostgreSQL into firing appropriate invalidations. I'd quite like to have a SQL-exposed way to do a forced global cache flush and invalidation for use in emergency scary catalog hacking situations.
So you can do quite a bit with direct catalog surgery, but it's dangerous and if you break the database, you get to keep the pieces.
Q1.1 If it is possible, is what is done in reality? I have the feeling that it is not the case and that DDL queries are implemented in C directly.
Right. See standard_ProcessUtility() and friends.
Q1.2 If it is possible and not done, what is the reason?
Speed - no need to run the full executor. Simplification of catalog access. Caching and invalidations. Chicken/egg problems: how do you "CREATE TABLE pg_class"? . Lots more.
Q2. Are PostgreSQL's "meta-constraints" (i.e. constraints related to database structure such as "a table can only have a single primary key") implemented in C code or via data constraints on PostgreSQL's meta-description tables?
System catalogs are not permitted to have CONSTRAINTs (CHECK constraints, UNIQUE constraints, PRIMARY KEY constraints, FOREIGN KEY constraints, etc).
All such management is done in C level logic with the assistance of the pg_depend catalog and the relationships it tracks.
Q2.1 If they are not implemented via data constraints on meta-description tables, why ?
Same as above.
Q2.2 Is there somewhere in the documentation a list of such "meta-constraints" implemented by PostgreSQL?
Not AFAIK.
Why?