Select random lines of a table using a probability distribution - Mailing list pgsql-sql
From | Jira, Marcel |
---|---|
Subject | Select random lines of a table using a probability distribution |
Date | |
Msg-id | D793F5C522F1DD40BB9DC43586C57637E098CC87C6@MBX-B.ad.wu-wien.ac.at Whole thread Raw |
Responses |
Re: Select random lines of a table using a probability
distribution
|
List | pgsql-sql |
<div class="WordSection1"><p class="MsoNormal">Hi!<p class="MsoNormal"> <p class="MsoNormal"><span lang="EN-US">Let’s considerI have a table like this</span><p class="MsoNormal"><span lang="EN-US"> </span><p class="MsoNormal"><span style="font-size:10.0pt;font-family:"CourierNew"">id qualification gender age income</span><p class="MsoNormal"><spanlang="EN-US"> </span><p class="MsoNormal"><span lang="EN-US">I’d like to select (for example 100)lines of this table by random, but the random mechanism has to follow a certain probability distribution.</span><p class="MsoNormal"><spanlang="EN-US"> </span><p class="MsoNormal"><span lang="EN-US">I want to use this procedure to constructa test group for another selection.</span><p class="MsoNormal"><span lang="EN-US"> </span><p class="MsoNormal"><spanlang="EN-US">Example:</span><p class="MsoNormal"><span lang="EN-US"> </span><p class="MsoNormal"><spanlang="EN-US">I filter all lines having the qualification “plumber”.</span><p class="MsoNormal"><spanlang="EN-US">I get 50 different ids consisting of 40 males, 10 females and a certain age distribution.</span><pclass="MsoNormal"><span lang="EN-US"> </span><p class="MsoNormal"><span lang="EN-US">I also get someinformation concerning the income of the plumbers.</span><p class="MsoNormal"><span lang="EN-US"> </span><p class="MsoNormal"><spanlang="EN-US">Now I want to know if the income is more influenced by the gender and age distributionor by the qualification “plumber”.</span><p class="MsoNormal"><span lang="EN-US"> </span><p class="MsoNormal"><spanlang="EN-US">Therefore I would like to select a test group (of 50 or more) without any plumbers. Thistest group has to follow the same age and gender distribution.</span><p class="MsoNormal"><span lang="EN-US"> </span><pclass="MsoNormal"><span lang="EN-US">Then I would be able to compare this groups income statisticswith the plumbers income statistics.</span><p class="MsoNormal"><span lang="EN-US"> </span><p class="MsoNormal"><spanlang="EN-US">Is this possible (and doable with reasonable effort) in PostgreSQL?</span><p class="MsoNormal"><spanlang="EN-US"> </span><p class="MsoNormal"><span lang="EN-US">Thank you in advance.</span><p class="MsoNormal"><spanlang="EN-US"> </span><p class="MsoNormal"><span lang="EN-US">Best regards,</span><p class="MsoNormal"><spanlang="EN-US"> </span><p class="MsoNormal"><span lang="EN-US">Marcel Jira</span><p class="MsoNormal"><spanlang="EN-US"> </span><p class="MsoNormal"><span style="font-size:8.0pt;font-family:"Courier New"">╔════<b> ~~~ * ~~~</b></span><p class="MsoNormal"><span style="font-size:8.0pt;font-family:"Courier New"">║ Mag.Marcel Jira</span><p class="MsoNormal"><span style="font-size:8.0pt;font-family:"Courier New"">║ Institut für Sozialpolitik,Wirtschaftsuniversität Wien</span><p class="MsoNormal"><span style="font-size:8.0pt;font-family:"Courier New"">║+43 1 313 36-5890</span><p class="MsoNormal"><span style="font-size:8.0pt;font-family:"Courier New"">║ UZA IV, D 317</span><pclass="MsoNormal"><span style="font-size:8.0pt;font-family:"Courier New"">║ <a href="http://www.wu.ac.at/sozialpolitik/team/wimi/jira">http://www.wu.ac.at/sozialpolitik/team/wimi/jira</a></span><p class="MsoNormal"><spanstyle="font-size:8.0pt;font-family:"Courier New"">╚════<b> ~~~ * ~~~</b></span><p class="MsoNormal"> </div>