spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Night Wolf <nightwolf...@gmail.com>
Subject Re: Spark SQL - UDF for scoring a model - take $"*"
Date Tue, 08 Sep 2015 07:47:07 GMT
So basically I need something like

df.withColumn("score", new Column(new Expression {
 ...

def eval(input: Row = null): EvaluatedType = myModel.score(input)
...

}))

But I can't do this, so how can I make a UDF or something like it, that can
take in a Row and pass back a double value or some struct...

On Tue, Sep 8, 2015 at 5:33 PM, Night Wolf <nightwolfzor@gmail.com> wrote:

> Not sure how that would work. Really I want to tack on an extra column
> onto the DF with a UDF that can take a Row object.
>
> On Tue, Sep 8, 2015 at 1:54 AM, Jörn Franke <jornfranke@gmail.com> wrote:
>
>> Can you use a map or list with different properties as one parameter?
>> Alternatively a string where parameters are Comma-separated...
>>
>> Le lun. 7 sept. 2015 à 8:35, Night Wolf <nightwolfzor@gmail.com> a
>> écrit :
>>
>>> Is it possible to have a UDF which takes a variable number of arguments?
>>>
>>> e.g. df.select(myUdf($"*")) fails with
>>>
>>> org.apache.spark.sql.AnalysisException: unresolved operator 'Project
>>> [scalaUDF(*) AS scalaUDF(*)#26];
>>>
>>> What I would like to do is pass in a generic data frame which can be
>>> then passed to a UDF which does scoring of a model. The UDF needs to know
>>> the schema to map column names in the model to columns in the DataFrame.
>>>
>>> The model has 100s of factors (very wide), so I can't just have a
>>> scoring UDF that has 500 parameters (for obvious reasons).
>>>
>>> Cheers,
>>> ~N
>>>
>>
>

Mime
View raw message