Home > mailing lists

Re: [HACKERS] CFH: Mariposa, distributed DB - Mailing list pgsql-hackers

From	Ross J. Reedstrom
Subject	Re: [HACKERS] CFH: Mariposa, distributed DB
Date	February 7, 2000 17:58:16
Msg-id	20000207165759.A25647@rice.edu Whole thread Raw
In response to	Re: [HACKERS] CFH: Mariposa, distributed DB (Don Baccus <dhogaza@pacifier.com>)
Responses	Re: [HACKERS] CFH: Mariposa, distributed DB
List	pgsql-hackers

Tree view

Seems there was more than just going back to the Berkeley site that
reminded me of Mariposa. A principle new functionality in Mariposa is 
the ability to 'fragment' a class, based on a user-defined partitioning
function. The example used is a widgets class, which is partitioned on
the 'location' field (i.e., the warehouse the widget is stored in)

CREATE TABLE widgets (part_no        int4,location    char16,on_hand        int4,on_order    int4,commited    int4
) PARTITION ON LOCATION USING btchar16cmp;

Then, the table is filled with tuples, all containing locations of either
'Miami' or 'New York'.

SELECT * from widgets; 

works as expected.

Later, this table is fragmented:

SPLIT FRAGMENT widgets INTO widgets_mi, widgets_ny AT 'Miami';

Now, the original table widgets is _empty_: all the tuples with location <=
'Miami' go to widgets_mi, location > 'Miami' go to widgets_ny.

SELECT * from widgets; 

Still returns all the tuples! So, this works sort of the way Chris Bitmead
has implemented subclasses: widgets_mi and widgets_ny are subclasses of
the widgets class, so selects return everything below. They differ in
that only PARTITIONed classes can be FRAGMENTed.

The distributed part comes in with the MOVE FRAGMENT command. This
transfers the 'master' copy of a table to the designated host, so future
access to that FRAGMENT will go over the network.

There's also a COPY FRAGMENT command, that sets up a local cache of a
fragment, with a periodic update time.  These copies may be either 
READONLY, or (default) READ/WRITE. Seems updates are timed only (simple
extension would be to implement write through behavior)

All this is coming from the Mariposa User's Manual, which is an extended
version of the Postgres95 User's Manual.

As to latest vs. best effort: One defines a BidCurve, who's dimensions are
Cost and Time. A flat curve should get you that latest data. And, since
the DataBroker and Bidder are both implemented as Tcl scripts, so it
would be possible to define a bid policy that only buys the latest data,
regardless of how long it's going to take.

Oh, BTW, yes that does put _two_ interpreted Tcl scripts on the execution
path for every query. Wonder what _that'll_ do for execution time. However,
it's like planning/optimization time, in that it's spent per query, rather
than per tuple.

Ross
-- 
Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> 
NSBRI Research Scientist/Programmer
Computer and Information Technology Institute
Rice University, 6100 S. Main St.,  Houston, TX 77005

On Mon, Feb 07, 2000 at 02:19:56PM -0800, Don Baccus wrote:
> At 12:04 AM 2/8/00 +0200, Hannu Krosing wrote:
> 
> >The site to go for information was determined by an auction where each site 
> >offered speed and cost for looking up the data. Usually the didn't also 
> >quarantee the latest data, just the "best effort".
> 
> I just glanced at the website.  They explicitly mention that they don't
> require global synchronization, because it would slow down response time
> for many things (with thousands of server, that sounds like an
> understatement).  
> 
> So, yes, it would appear they don't guarantee the latest data.
>

pgsql-hackers by date:

From: Don Baccus
Date: 07 February 2000, 17:24:16
Subject: Re: [HACKERS] CFH: Mariposa, distributed DB

From: Peter Eisentraut
Date: 07 February 2000, 18:08:17
Subject: psql and libpq fixes

Re: [HACKERS] CFH: Mariposa, distributed DB - Mailing list pgsql-hackers

Previous

Next