Re: [HACKERS] CFH: Mariposa, distributed DB - Mailing list pgsql-hackers
From | Ross J. Reedstrom |
---|---|
Subject | Re: [HACKERS] CFH: Mariposa, distributed DB |
Date | |
Msg-id | 20000207165759.A25647@rice.edu Whole thread Raw |
In response to | Re: [HACKERS] CFH: Mariposa, distributed DB (Don Baccus <dhogaza@pacifier.com>) |
Responses |
Re: [HACKERS] CFH: Mariposa, distributed DB
|
List | pgsql-hackers |
Seems there was more than just going back to the Berkeley site that reminded me of Mariposa. A principle new functionality in Mariposa is the ability to 'fragment' a class, based on a user-defined partitioning function. The example used is a widgets class, which is partitioned on the 'location' field (i.e., the warehouse the widget is stored in) CREATE TABLE widgets (part_no int4,location char16,on_hand int4,on_order int4,commited int4 ) PARTITION ON LOCATION USING btchar16cmp; Then, the table is filled with tuples, all containing locations of either 'Miami' or 'New York'. SELECT * from widgets; works as expected. Later, this table is fragmented: SPLIT FRAGMENT widgets INTO widgets_mi, widgets_ny AT 'Miami'; Now, the original table widgets is _empty_: all the tuples with location <= 'Miami' go to widgets_mi, location > 'Miami' go to widgets_ny. SELECT * from widgets; Still returns all the tuples! So, this works sort of the way Chris Bitmead has implemented subclasses: widgets_mi and widgets_ny are subclasses of the widgets class, so selects return everything below. They differ in that only PARTITIONed classes can be FRAGMENTed. The distributed part comes in with the MOVE FRAGMENT command. This transfers the 'master' copy of a table to the designated host, so future access to that FRAGMENT will go over the network. There's also a COPY FRAGMENT command, that sets up a local cache of a fragment, with a periodic update time. These copies may be either READONLY, or (default) READ/WRITE. Seems updates are timed only (simple extension would be to implement write through behavior) All this is coming from the Mariposa User's Manual, which is an extended version of the Postgres95 User's Manual. As to latest vs. best effort: One defines a BidCurve, who's dimensions are Cost and Time. A flat curve should get you that latest data. And, since the DataBroker and Bidder are both implemented as Tcl scripts, so it would be possible to define a bid policy that only buys the latest data, regardless of how long it's going to take. Oh, BTW, yes that does put _two_ interpreted Tcl scripts on the execution path for every query. Wonder what _that'll_ do for execution time. However, it's like planning/optimization time, in that it's spent per query, rather than per tuple. Ross -- Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> NSBRI Research Scientist/Programmer Computer and Information Technology Institute Rice University, 6100 S. Main St., Houston, TX 77005 On Mon, Feb 07, 2000 at 02:19:56PM -0800, Don Baccus wrote: > At 12:04 AM 2/8/00 +0200, Hannu Krosing wrote: > > >The site to go for information was determined by an auction where each site > >offered speed and cost for looking up the data. Usually the didn't also > >quarantee the latest data, just the "best effort". > > I just glanced at the website. They explicitly mention that they don't > require global synchronization, because it would slow down response time > for many things (with thousands of server, that sounds like an > understatement). > > So, yes, it would appear they don't guarantee the latest data. >
pgsql-hackers by date: