Home > mailing lists

Re: multiple sampling from tables and saving output - Mailing list pgsql-novice

From	Tom Lane
Subject	Re: multiple sampling from tables and saving output
Date	February 7, 2005 15:55:00
Msg-id	27273.1107791686@sss.pgh.pa.us Whole thread Raw
In response to	multiple sampling from tables and saving output (David Orme <d.orme@imperial.ac.uk>)
List	pgsql-novice

Tree view

David Orme <d.orme@imperial.ac.uk> writes:
> The process I need to do is a loop of  1000 repetitions of the
> following:

> 1) select a random subset of the data from a table
> 2) save various summaries of the randomly selected data

> I can think of various external ways of doing this - my current plan is
> to use a shell script to resend the same set of instructions repeated
> times using 'psql -f instruction_set.sql'  - but I was wondering if
> there was a canonical way of doing this within pgsql.

If you want a sample of, say, 1% of the rows in a table, you can do

    select * from mytable where random() < 0.01;

and get a genuinely unbiased sample.  Keep in mind though that you can't
get an exact sample size this way --- it'll be close to 1% but probably
not spot on.

            regards, tom lane

pgsql-novice by date:

From: DAVANNE Eric - NTR
Date: 07 February 2005, 15:26:35
Subject: password expiration interval

From: Tom Lane
Date: 07 February 2005, 16:00:16
Subject: Re: Percent of update completed

Re: multiple sampling from tables and saving output - Mailing list pgsql-novice

Previous

Next