Thread: Predictive or scoring solution for PostgreSQL ?

Predictive or scoring solution for PostgreSQL ?

From

Hervé Piedvache

Date:

04 February 2004, 16:46:08

Hi,

Does anyone know a predictive or a database scoring solution for PostgreSQL ?

I'm looking for a system able to take a database with for example 100 000
records in total, inside them we have got 1000 records with one field set to
YES ... with about 100 fields in the table ...
The system should be able to set a score to the 100 fields to determine the
most importants fields to this 1000 records who's got the YES value ...
Then set a formula ... to calculate and to apply to the rest of the database
the same score ... and then estimate (predictive thing) in the 90 000 rest of
records which one may have the famous field set to YES ...

I hope I'm clear in my demand ... ;o)

Hope also someone have already heard about this ... and may be could help
me ;o)

best regards,
--
Hervé

Re: Predictive or scoring solution for PostgreSQL ?

From

"Marc A. Leith"

Date:

04 February 2004, 22:21:35

Hmmmm, it's been a while since I did this but...

This was with Sybase (it should be configurable with ODBC by now) but we used a
tool called ModelMAX (Advanced Software Appliactions or A.S.A) which could
select a sample of records and score them on the basis of fields (you need some
NO's as well). It produced 'C' code that would score non-flagged records on the
basis of the new results.

Our process was to select a sample of YES/NO records and split it into to two
samples. (The Yes records are actually coded as '1' and the No records as '0').
The No records give the system something to differentiate.

The first and larger sample was used to generate or train the neural net. Then
the second sample (with known values) was scored using the new model, and the
known result compared with the score.

Generally the score was a probability - of response or credit card application
approval or the like.

If the model is valid, the formula can be rolled out to the database.

The trick is that the tool needs to understand something about the fields
available for scoring. Domain and type, ranges and codings - if these are fixed
they are a one time setup.

Other tools do similar things - another was Knowledge Seeker from Angoss
Software - which built turnkey decision trees (this was fairly cheap depending
on the system it is running on). SAS also produced a turnkey modeling solution
(not cheap $$$$).  You could also try SPSS (cheaper than SAS). Group 1 Software
also marketed an all-in-one Modeling Sol'n - Model 1 (I think) but I never
actually used it.

I'll dig around and see if I can find an article I wrote about this...

Marc A. Leith
President
redboxdata inc.

E-mail:mleith@redboxdata.com

Quoting Hervé Piedvache <footcow@noos.fr>:

> Hi,
>
> Does anyone know a predictive or a database scoring solution for PostgreSQL
> ?
>
> I'm looking for a system able to take a database with for example 100 000
> records in total, inside them we have got 1000 records with one field set to
>
> YES ... with about 100 fields in the table ...
> The system should be able to set a score to the 100 fields to determine the
> most importants fields to this 1000 records who's got the YES value ...
> Then set a formula ... to calculate and to apply to the rest of the database
>
> the same score ... and then estimate (predictive thing) in the 90 000 rest of
>
> records which one may have the famous field set to YES ...
>
> I hope I'm clear in my demand ... ;o)
>
> Hope also someone have already heard about this ... and may be could help
> me ;o)
>
> best regards,
> --
> Hervé
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend
>

Re: Predictive or scoring solution for PostgreSQL ?

From

Mike Mascari

Date:

04 February 2004, 23:32:42

Quoting Hervé Piedvache <footcow@noos.fr>:

>> Hi,
>> Does anyone know a predictive or a database scoring solution for PostgreSQL

in response, Marc A. Leith wrote:

>Hmmmm, it's been a while since I did this but...
>
>Other tools do similar things - another was Knowledge Seeker from Angoss
>Software - which built turnkey decision trees (this was fairly cheap depending
>on the system it is running on). SAS also produced a turnkey modeling solution
>(not cheap $$$$).  You could also try SPSS (cheaper than SAS). Group 1 Software
>also marketed an all-in-one Modeling Sol'n - Model 1 (I think) but I never
>actually used it.
>
>
Would Joe Conway's PL/R procedural language be any help here? I'd guess
there's an R package to fit the bill, but then again I'm only on page 30
of Modern Applied Statistics in S-Plus. ;-)

Mike Mascari

Re: Predictive or scoring solution for PostgreSQL ?

From

Joe Conway

Date:

05 February 2004, 01:11:48

Marc A. Leith wrote:
> Other tools do similar things - another was Knowledge Seeker from Angoss
> Software - which built turnkey decision trees (this was fairly cheap depending
> on the system it is running on). SAS also produced a turnkey modeling solution
> (not cheap $$$$).  You could also try SPSS (cheaper than SAS). Group 1 Software
> also marketed an all-in-one Modeling Sol'n - Model 1 (I think) but I never
> actually used it.

Or try R (open source implementation of the S language, similar to
S-PLUS)...
   http://www.r-project.org/

...along with PL/R:
   http://www.joeconway.com/plr/

And see here for a variety of packages to do just about any kind of
analysis you can think of:
   http://cran.r-project.org/

Some assembly required, but powerful and free.

HTH,

Joe

Re: Predictive or scoring solution for PostgreSQL ?

From

"Marc A. Leith"

Date:

05 February 2004, 01:20:58

Quoting Mike Mascari <mascarm@mascari.com>:

> Quoting Hervé Piedvache <footcow@noos.fr>:
>
> >> Hi,
> >> Does anyone know a predictive or a database scoring solution for
> PostgreSQL
>
> in response, Marc A. Leith wrote:
>
> >Hmmmm, it's been a while since I did this but...
> >
> >Other tools do similar things - another was Knowledge Seeker from Angoss
> >Software - which built turnkey decision trees (this was fairly cheap
> depending
> >on the system it is running on). SAS also produced a turnkey modeling
> solution
> >(not cheap $$$$).  You could also try SPSS (cheaper than SAS). Group 1
> Software
> >also marketed an all-in-one Modeling Sol'n - Model 1 (I think) but I never
> >actually used it.
> >
> >
> Would Joe Conway's PL/R procedural language be any help here? I'd guess
> there's an R package to fit the bill, but then again I'm only on page 30
> of Modern Applied Statistics in S-Plus. ;-)
>
> Mike Mascari
>

For a turnkey modeling solution, you need more than simple stat functions. These
solutions automatically transform or 'bucketize' the data and then analyze the
covariance between the score variables and the known result.

They then select a smaller number of variables and use them to build a model -
this may be done with a backward-propogation neural network, a more traditional
regression model, or some sort of decision tree or CHAID system. Model 1 uses 3
or 4 approaches and selects the 1 with the best (truest fit).

ModelMAX (and the like) have been honed over the last decade by teams of
statisticians and still generate models that are close but not yet equal to
those that our modeling team used to build. The difference was I could build a
model in a few hours (limited by the CPU on the PC) and they took several weeks
to hand tune the result.

Marc A. Leith
President
redboxdata inc.

E-mail:mleith@redboxdata.com

Re: Predictive or scoring solution for PostgreSQL ?

From

Mike Mascari

Date:

05 February 2004, 08:47:05

Marc A. Leith wrote:

>Quoting Mike Mascari <mascarm@mascari.com>:
>
>
>>Would Joe Conway's PL/R procedural language be any help here? I'd guess
>>there's an R package to fit the bill, but then again I'm only on page 30
>>of Modern Applied Statistics in S-Plus. ;-)
>>
>>For a turnkey modeling solution, you need more than simple stat functions. These
>>solutions automatically transform or 'bucketize' the data and then analyze the
>>covariance between the score variables and the known result.
>>
>>

I'm obviously not in any position to define what is needed here. I only
had business statistics in college as a requirement for an economics
degree many years ago. However, I will say that you may be
underestimating R's capabilities. It includes linear and non-linear
regression models, neural networks, time-series analysis, and a host
(and I mean 100's) of other models I have yet to fathom.  I'd humbly
speculate that the core developers, include the chairman of the
statistics department at Oxford, would take issue with its
characterization as "simple stat functions". But what do I know... :-)

Mike Mascari

Re: Predictive or scoring solution for PostgreSQL ?

From

"Marc A. Leith"

Date:

05 February 2004, 09:29:49

On Thu, 05 Feb 2004 07:45:41 -0500, Mike Mascari wrote:

>Marc A. Leith wrote:
>
>>Quoting Mike Mascari <mascarm@mascari.com>:
>>
>>
>>>Would Joe Conway's PL/R procedural language be any help here? I'd guess
>>>there's an R package to fit the bill, but then again I'm only on page 30
>>>of Modern Applied Statistics in S-Plus. ;-)
>>>
>>>For a turnkey modeling solution, you need more than simple stat functions. These
>>>solutions automatically transform or 'bucketize' the data and then analyze the
>>>covariance between the score variables and the known result.
>>>
>>>
>
>I'm obviously not in any position to define what is needed here. I only
>had business statistics in college as a requirement for an economics
>degree many years ago. However, I will say that you may be
>underestimating R's capabilities. It includes linear and non-linear
>regression models, neural networks, time-series analysis, and a host
>(and I mean 100's) of other models I have yet to fathom. I'd humbly
>speculate that the core developers, include the chairman of the
>statistics department at Oxford, would take issue with its
>characterization as "simple stat functions". But what do I know... :-)
>
>Mike Mascari
>

Fair enough - I took a look at the links that Joe Conway provided and it seems very powerful and feature complete. My comment was unfair, and consider it rephrased/withdrawn

- BUT is it turnkey? The original question sought a 'system' to score the database.

SAS & SPSS can be configured to do this, as likely R can be, but does that make it a system?

The solutions I suggested can be run by someone with virtually no knowledge of stats (Not that I suggest this for complex issues). They can select an appropriate model in minutes rather than needing a MA to desing a solution.

Marc

Marc A. Leith
President
redboxdata inc.

e-mail: marc@redboxdata.com
cell: (416) 737 0045