Re: pl/R questions - Mailing list pgsql-general
From | Joe Conway |
---|---|
Subject | Re: pl/R questions |
Date | |
Msg-id | 3F2BF144.6010307@joeconway.com Whole thread Raw |
In response to | pl/R questions (Mike Mascari <mascarm@mascari.com>) |
Responses |
Re: pl/R questions
|
List | pgsql-general |
Mike Mascari wrote: > (A) The function r_resetlm() must be called to reset the global values > before each invocation. Not a big problem, but I would like to avoid > globals, if possible. The relations supplying the data are temporary > tables and thus I cannot refer to their names in static pl/R. I can't > figure out a way to use pg.spi.prepare()/pg.spi.execp() to initialize > R variables with the result of the executed queries. I would like to > do something like this, instead: > > CREATE OR REPLACE FUNCTION r_predict(text, text) > RETURNS SETOF RECORD AS ' > > sql <- paste("SELECT x, y FROM", arg1, "ORDER BY x") > plan <- pg.spi.prepare(sql, NA) > pg.spi.execp(plan, NA) > > ??? Read results into appropriate vectors > > samples <- data.frame(xs=nxs) > result <- predict(lm(ys ~ xs), samples) > return (result) > > ' LANGUAGE 'plr' WITH (isStrict); I don't think you can do a prepared plan if the table itself is going to change, only when parameters change. Maybe something like this works: CREATE OR REPLACE FUNCTION r_predict(text, text) RETURNS SETOF RECORD AS ' sql <- paste("SELECT x, y FROM", arg1, "ORDER BY x") xyknowns <- pg.spi.exec(sql) xs <- as.numeric(xyknowns[,1]) ys <- as.numeric(xyknowns[,2]) sql <- paste("SELECT x FROM", arg2, "ORDER BY x") xypred <- pg.spi.exec(sql) nxs <- as.numeric(xypred[,1]) samples <- data.frame(xs=nxs) result <- predict(lm(ys ~ xs), samples) return (result) ' LANGUAGE 'plr' WITH (isStrict); regression=# select * from r_predict('entries', 'predictions') as trend(ny float8); ny ------------------ 146171.515151515 147189.696969697 148207.878787879 149226.060606061 150244.242424242 (5 rows) > (B) I suppose an unqualified SELECT will always invoke r_initknowns() > and r_initpredicts() but is this guaranteed? And guaranteed to only be > executed once for each tuple? If so, then I'm somewhat less bothered > by the use of R globals. Is using the VOLATILE attribute in the CREATE > FUNTION statement sufficient to guarantee that the call will always be > made? Use the above -- I think your original multistep process is not the way to go anyway > (C) For the life of me, and this is an R question, I cannot figure out > how to get R to perform predictions on multivariate data: I'm sure there is support for multivariate linear regression in R, but I'm still too new at R to know the answer myself. You should try posting that one to R-help. BTW, I created a PL/R specific mailing list on gborg, but no one is subscribed currently. If people on this list find PL/R specific questions too off-topic, perhaps we should move there. R specific questions should definitely be posted to R-help though. Regards, Joe
pgsql-general by date: