spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bedrytski Aliaksandr <sp...@bedryt.ski>
Subject Re: Random forest binary classification H20 difference Spark
Date Thu, 11 Aug 2016 06:00:22 GMT
Hi Samir,

either use *dataframe.na.fill()* method or the *nvl()* UDF when
selecting features:

val train = sqlContext.sql("SELECT ... nvl(Field, 1.0) AS Field ...
FROM test")

--
  Bedrytski Aliaksandr
  spark@bedryt.ski



On Wed, Aug 10, 2016, at 11:19, Yanbo Liang wrote:
> Hi Samir,
>
> Did you use VectorAssembler to assemble some columns into the feature
> column? If there are NULLs in your dataset, VectorAssembler will throw
> this exception. You can use DataFrame.drop() or DataFrame.replace() to
> drop/substitute NULL values.
>
> Thanks
> Yanbo
>
> 2016-08-07 19:51 GMT-07:00 Javier Rey <jreyro@gmail.com>:
>> Hi everybody.
>> I have executed RF on H2O I didn't troubles with nulls values, by in
>> contrast in Spark using dataframes and ML library I obtain this
>> error,l I know my dataframe contains nulls, but I understand that
>> Random Forest supports null values:
>>
>> "Values to assemble cannot be null"
>>
>> Any advice, that framework can handle this issue?.
>>
>> Regards,
>> Samir

Mime
View raw message