Re: multiple sampling from tables and saving output - Mailing list pgsql-novice

From Tom Lane
Subject Re: multiple sampling from tables and saving output
Date
Msg-id 27273.1107791686@sss.pgh.pa.us
Whole thread Raw
In response to multiple sampling from tables and saving output  (David Orme <d.orme@imperial.ac.uk>)
List pgsql-novice
David Orme <d.orme@imperial.ac.uk> writes:
> The process I need to do is a loop of  1000 repetitions of the
> following:

> 1) select a random subset of the data from a table
> 2) save various summaries of the randomly selected data

> I can think of various external ways of doing this - my current plan is
> to use a shell script to resend the same set of instructions repeated
> times using 'psql -f instruction_set.sql'  - but I was wondering if
> there was a canonical way of doing this within pgsql.

If you want a sample of, say, 1% of the rows in a table, you can do

    select * from mytable where random() < 0.01;

and get a genuinely unbiased sample.  Keep in mind though that you can't
get an exact sample size this way --- it'll be close to 1% but probably
not spot on.

            regards, tom lane

pgsql-novice by date:

Previous
From: DAVANNE Eric - NTR
Date:
Subject: password expiration interval
Next
From: Tom Lane
Date:
Subject: Re: Percent of update completed