Thread: Predictive or scoring solution for PostgreSQL ?
Hi, Does anyone know a predictive or a database scoring solution for PostgreSQL ? I'm looking for a system able to take a database with for example 100 000 records in total, inside them we have got 1000 records with one field set to YES ... with about 100 fields in the table ... The system should be able to set a score to the 100 fields to determine the most importants fields to this 1000 records who's got the YES value ... Then set a formula ... to calculate and to apply to the rest of the database the same score ... and then estimate (predictive thing) in the 90 000 rest of records which one may have the famous field set to YES ... I hope I'm clear in my demand ... ;o) Hope also someone have already heard about this ... and may be could help me ;o) best regards, -- Hervé
Hmmmm, it's been a while since I did this but... This was with Sybase (it should be configurable with ODBC by now) but we used a tool called ModelMAX (Advanced Software Appliactions or A.S.A) which could select a sample of records and score them on the basis of fields (you need some NO's as well). It produced 'C' code that would score non-flagged records on the basis of the new results. Our process was to select a sample of YES/NO records and split it into to two samples. (The Yes records are actually coded as '1' and the No records as '0'). The No records give the system something to differentiate. The first and larger sample was used to generate or train the neural net. Then the second sample (with known values) was scored using the new model, and the known result compared with the score. Generally the score was a probability - of response or credit card application approval or the like. If the model is valid, the formula can be rolled out to the database. The trick is that the tool needs to understand something about the fields available for scoring. Domain and type, ranges and codings - if these are fixed they are a one time setup. Other tools do similar things - another was Knowledge Seeker from Angoss Software - which built turnkey decision trees (this was fairly cheap depending on the system it is running on). SAS also produced a turnkey modeling solution (not cheap $$$$). You could also try SPSS (cheaper than SAS). Group 1 Software also marketed an all-in-one Modeling Sol'n - Model 1 (I think) but I never actually used it. I'll dig around and see if I can find an article I wrote about this... Marc A. Leith President redboxdata inc. E-mail:mleith@redboxdata.com Quoting Hervé Piedvache <footcow@noos.fr>: > Hi, > > Does anyone know a predictive or a database scoring solution for PostgreSQL > ? > > I'm looking for a system able to take a database with for example 100 000 > records in total, inside them we have got 1000 records with one field set to > > YES ... with about 100 fields in the table ... > The system should be able to set a score to the 100 fields to determine the > most importants fields to this 1000 records who's got the YES value ... > Then set a formula ... to calculate and to apply to the rest of the database > > the same score ... and then estimate (predictive thing) in the 90 000 rest of > > records which one may have the famous field set to YES ... > > I hope I'm clear in my demand ... ;o) > > Hope also someone have already heard about this ... and may be could help > me ;o) > > best regards, > -- > Hervé > > > ---------------------------(end of broadcast)--------------------------- > TIP 8: explain analyze is your friend >
Quoting Hervé Piedvache <footcow@noos.fr>: >> Hi, >> Does anyone know a predictive or a database scoring solution for PostgreSQL in response, Marc A. Leith wrote: >Hmmmm, it's been a while since I did this but... > >Other tools do similar things - another was Knowledge Seeker from Angoss >Software - which built turnkey decision trees (this was fairly cheap depending >on the system it is running on). SAS also produced a turnkey modeling solution >(not cheap $$$$). You could also try SPSS (cheaper than SAS). Group 1 Software >also marketed an all-in-one Modeling Sol'n - Model 1 (I think) but I never >actually used it. > > Would Joe Conway's PL/R procedural language be any help here? I'd guess there's an R package to fit the bill, but then again I'm only on page 30 of Modern Applied Statistics in S-Plus. ;-) Mike Mascari
Marc A. Leith wrote: > Other tools do similar things - another was Knowledge Seeker from Angoss > Software - which built turnkey decision trees (this was fairly cheap depending > on the system it is running on). SAS also produced a turnkey modeling solution > (not cheap $$$$). You could also try SPSS (cheaper than SAS). Group 1 Software > also marketed an all-in-one Modeling Sol'n - Model 1 (I think) but I never > actually used it. Or try R (open source implementation of the S language, similar to S-PLUS)... http://www.r-project.org/ ...along with PL/R: http://www.joeconway.com/plr/ And see here for a variety of packages to do just about any kind of analysis you can think of: http://cran.r-project.org/ Some assembly required, but powerful and free. HTH, Joe
Quoting Mike Mascari <mascarm@mascari.com>: > Quoting Hervé Piedvache <footcow@noos.fr>: > > >> Hi, > >> Does anyone know a predictive or a database scoring solution for > PostgreSQL > > in response, Marc A. Leith wrote: > > >Hmmmm, it's been a while since I did this but... > > > >Other tools do similar things - another was Knowledge Seeker from Angoss > >Software - which built turnkey decision trees (this was fairly cheap > depending > >on the system it is running on). SAS also produced a turnkey modeling > solution > >(not cheap $$$$). You could also try SPSS (cheaper than SAS). Group 1 > Software > >also marketed an all-in-one Modeling Sol'n - Model 1 (I think) but I never > >actually used it. > > > > > Would Joe Conway's PL/R procedural language be any help here? I'd guess > there's an R package to fit the bill, but then again I'm only on page 30 > of Modern Applied Statistics in S-Plus. ;-) > > Mike Mascari > For a turnkey modeling solution, you need more than simple stat functions. These solutions automatically transform or 'bucketize' the data and then analyze the covariance between the score variables and the known result. They then select a smaller number of variables and use them to build a model - this may be done with a backward-propogation neural network, a more traditional regression model, or some sort of decision tree or CHAID system. Model 1 uses 3 or 4 approaches and selects the 1 with the best (truest fit). ModelMAX (and the like) have been honed over the last decade by teams of statisticians and still generate models that are close but not yet equal to those that our modeling team used to build. The difference was I could build a model in a few hours (limited by the CPU on the PC) and they took several weeks to hand tune the result. Marc A. Leith President redboxdata inc. E-mail:mleith@redboxdata.com
Marc A. Leith wrote: >Quoting Mike Mascari <mascarm@mascari.com>: > > >>Would Joe Conway's PL/R procedural language be any help here? I'd guess >>there's an R package to fit the bill, but then again I'm only on page 30 >>of Modern Applied Statistics in S-Plus. ;-) >> >>For a turnkey modeling solution, you need more than simple stat functions. These >>solutions automatically transform or 'bucketize' the data and then analyze the >>covariance between the score variables and the known result. >> >> I'm obviously not in any position to define what is needed here. I only had business statistics in college as a requirement for an economics degree many years ago. However, I will say that you may be underestimating R's capabilities. It includes linear and non-linear regression models, neural networks, time-series analysis, and a host (and I mean 100's) of other models I have yet to fathom. I'd humbly speculate that the core developers, include the chairman of the statistics department at Oxford, would take issue with its characterization as "simple stat functions". But what do I know... :-) Mike Mascari
On Thu, 05 Feb 2004 07:45:41 -0500, Mike Mascari wrote:
>Marc A. Leith wrote:
>
>>Quoting Mike Mascari <mascarm@mascari.com>:
>>
>>
>>>Would Joe Conway's PL/R procedural language be any help here? I'd guess
>>>there's an R package to fit the bill, but then again I'm only on page 30
>>>of Modern Applied Statistics in S-Plus. ;-)
>>>
>>>For a turnkey modeling solution, you need more than simple stat functions. These
>>>solutions automatically transform or 'bucketize' the data and then analyze the
>>>covariance between the score variables and the known result.
>>>
>>>
>
>I'm obviously not in any position to define what is needed here. I only
>had business statistics in college as a requirement for an economics
>degree many years ago. However, I will say that you may be
>underestimating R's capabilities. It includes linear and non-linear
>regression models, neural networks, time-series analysis, and a host
>(and I mean 100's) of other models I have yet to fathom. I'd humbly
>speculate that the core developers, include the chairman of the
>statistics department at Oxford, would take issue with its
>characterization as "simple stat functions". But what do I know... :-)
>
>Mike Mascari
>
Fair enough - I took a look at the links that Joe Conway provided and it seems very powerful and feature complete. My comment was unfair, and consider it rephrased/withdrawn
- BUT is it turnkey? The original question sought a 'system' to score the database.
SAS & SPSS can be configured to do this, as likely R can be, but does that make it a system?
The solutions I suggested can be run by someone with virtually no knowledge of stats (Not that I suggest this for complex issues). They can select an appropriate model in minutes rather than needing a MA to desing a solution.
Marc
Marc A. Leith
President
redboxdata inc.
e-mail: marc@redboxdata.com
cell: (416) 737 0045
>Marc A. Leith wrote:
>
>>Quoting Mike Mascari <mascarm@mascari.com>:
>>
>>
>>>Would Joe Conway's PL/R procedural language be any help here? I'd guess
>>>there's an R package to fit the bill, but then again I'm only on page 30
>>>of Modern Applied Statistics in S-Plus. ;-)
>>>
>>>For a turnkey modeling solution, you need more than simple stat functions. These
>>>solutions automatically transform or 'bucketize' the data and then analyze the
>>>covariance between the score variables and the known result.
>>>
>>>
>
>I'm obviously not in any position to define what is needed here. I only
>had business statistics in college as a requirement for an economics
>degree many years ago. However, I will say that you may be
>underestimating R's capabilities. It includes linear and non-linear
>regression models, neural networks, time-series analysis, and a host
>(and I mean 100's) of other models I have yet to fathom. I'd humbly
>speculate that the core developers, include the chairman of the
>statistics department at Oxford, would take issue with its
>characterization as "simple stat functions". But what do I know... :-)
>
>Mike Mascari
>
Fair enough - I took a look at the links that Joe Conway provided and it seems very powerful and feature complete. My comment was unfair, and consider it rephrased/withdrawn
- BUT is it turnkey? The original question sought a 'system' to score the database.
SAS & SPSS can be configured to do this, as likely R can be, but does that make it a system?
The solutions I suggested can be run by someone with virtually no knowledge of stats (Not that I suggest this for complex issues). They can select an appropriate model in minutes rather than needing a MA to desing a solution.
Marc
Marc A. Leith
President
redboxdata inc.
e-mail: marc@redboxdata.com
cell: (416) 737 0045