Home > mailing lists

Re: Big 7.1 open items - Mailing list pgsql-hackers

From	Don Baccus
Subject	Re: Big 7.1 open items
Date	June 16, 2000 14:40:17
Msg-id	3.0.1.32.20000616105023.011dbdb0@mail.pacifier.com Whole thread Raw
In response to	Re: Big 7.1 open items (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: Big 7.1 open items Re: Big 7.1 open items
List	pgsql-hackers

Tree view

At 11:46 AM 6/16/00 -0400, Tom Lane wrote:

>OK, to get back to the point here: so in Oracle, tables can't cross
>tablespace boundaries,

Right, the construct AFAIK is "create table/index foo on tablespace ..."

> but a tablespace itself could span multiple
>disks?

Right.

>Not sure if I like that better or worse than equating a tablespace
>with a directory (so, presumably, all the files within it live on
>one filesystem) and then trying to make tables able to span
>tablespaces.  We will need to do one or the other though, if we want
>to have any significant improvement over the current state of affairs
>for large tables.

Oracle's way does a reasonable job of isolating the datamodel
from the details of the physical layout.

Take the OpenACS web toolkit, for instance.  We could take
each module's tables and indices and assign them appropriately
to various dataspaces, then provide a separate .sql files with
only "create tablespace" statements in there.

By modifying that one central file, the toolkit installation
could be customized to run anything from a small site (one
disk with everything on it, ala my own personal webserver at
birdnotes.net) or a very large site with many spindles, with
various index and table structures spread out widely hither
and thither.

Given that the OpenACS datamodel is nearly 10K lines long (including
many comments, of course), being able to customize an installation
to such a degree by modifying a single file filled with "create
tablespaces" would be very attractive.

>One way is to play the flip-the-path-ordering game some more,
>and access multiple-segment tables with pathnames like this:
>
>    .../TABLESPACE/RELATION        -- first or only segment
>    .../TABLESPACE/N/RELATION    -- N'th extension segment
>
>This isn't any harder for md.c to deal with than what we do now,
>but by making the /N subdirectories be symlinks, the dbadmin could
>easily arrange for extension segments to go on different filesystems.

I personally dislike depending on symlinks to move stuff around.
Among other things, a pg_dump/restore (and presumably future 
backup tools?) can't recreate the disk layout automatically.

>We'd still want to create some tools to help the dbadmin with slinging
>all these symlinks around, of course.

OK, if symlinks are simply an implementation detail hidden from the
dbadmin, and if the physical structure is kept in the db so it can
be rebuilt if necessary automatically, then I don't mind symlinks.

> But I think it's critical to keep
>the low-level file access protocol simple and reliable, which really
>means minimizing the amount of information the backend needs to know to
>figure out which file to write a page in.  With something like the above
>you only need to know the tablespace name (or more likely OID), the
>relation OID (+name or not, depending on outcome of other argument),
>and the offset in the table.  No worse than now from the software's
>point of view.

Make the code that creates and otherwise manipulates tablespaces
do the work, while keeping the low-level file access protocol simple.

Yes, this approach sounds very good to me.

- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.

pgsql-hackers by date:

From: Don Baccus
Date: 16 June 2000, 14:40:13
Subject: Re: Big 7.1 open items

From: Tom Lane
Date: 16 June 2000, 14:42:38
Subject: Re: planner question re index vs seqscan

Re: Big 7.1 open items - Mailing list pgsql-hackers

Previous

Next