spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yanbo Liang (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-15153) SparkR spark.naiveBayes throws error when label is numeric type
Date Thu, 05 May 2016 12:58:12 GMT

     [ https://issues.apache.org/jira/browse/SPARK-15153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Yanbo Liang updated SPARK-15153:
--------------------------------
    Description: 
When the label of dataset is numeric type, SparkR spark.naiveBayes will throw error. This
bug is easy to reproduce:
{code}
t <- as.data.frame(Titanic)
t1 <- t[t$Freq > 0, -5]
t1$NumericSurvived <- ifelse(t1$Survived == "No", 0, 1)
t2 <- t1[-4]
df <- suppressWarnings(createDataFrame(sqlContext, t2))
m <- spark.naiveBayes(df, NumericSurvived ~ .)

16/05/05 03:26:17 ERROR RBackendHandler: fit on org.apache.spark.ml.r.NaiveBayesWrapper failed
Error in invokeJava(isStatic = TRUE, className, methodName, ...) :
  java.lang.ClassCastException: org.apache.spark.ml.attribute.UnresolvedAttribute$ cannot
be cast to org.apache.spark.ml.attribute.NominalAttribute
	at org.apache.spark.ml.r.NaiveBayesWrapper$.fit(NaiveBayesWrapper.scala:66)
	at org.apache.spark.ml.r.NaiveBayesWrapper.fit(NaiveBayesWrapper.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:141)
	at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:86)
	at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:38)
	at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
	at io.netty.channel.AbstractChannelHandlerContext.invo
{code}

In RFormula, the response variable type could be string or numeric. If it's string, RFormula
will transform it to label of DoubleType by StringIndexer and set corresponding column metadata;
otherwise, RFormula will directly use it as label when training model (and assumes that it
was numbered from 0, ..., maxLabelIndex). 
When we extract labels at ml.r.NaiveBayesWrapper, we should handle it according the type of
the response variable (string or numeric).

cc [~mengxr] [~josephkb]

  was:
When the label of dataset is numeric type, SparkR spark.naiveBayes will throw error. This
bug is easy to reproduce:
{code}
t <- as.data.frame(Titanic)
t1 <- t[t$Freq > 0, -5]
t1$NumericSurvived <- ifelse(t1$Survived == "No", 0, 1)
t2 <- t1[-4]
df <- suppressWarnings(createDataFrame(sqlContext, t2))
m <- spark.naiveBayes(df, NumericSurvived ~ .)

16/05/05 03:26:17 ERROR RBackendHandler: fit on org.apache.spark.ml.r.NaiveBayesWrapper failed
Error in invokeJava(isStatic = TRUE, className, methodName, ...) :
  java.lang.ClassCastException: org.apache.spark.ml.attribute.UnresolvedAttribute$ cannot
be cast to org.apache.spark.ml.attribute.NominalAttribute
	at org.apache.spark.ml.r.NaiveBayesWrapper$.fit(NaiveBayesWrapper.scala:66)
	at org.apache.spark.ml.r.NaiveBayesWrapper.fit(NaiveBayesWrapper.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:141)
	at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:86)
	at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:38)
	at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
	at io.netty.channel.AbstractChannelHandlerContext.invo
{code}

In RFormula, the response variable type could be string or numeric. If it's string, RFormula
will transform it to label of DoubleType by StringIndexer and set corresponding column metadata;
otherwise, RFormula will directly use it as label when training model (and assumes that it
was numbered from 0, ..., maxLabelIndex). 
When we extract labels at ml.r.NaiveBayesWrapper, we should handle it according the type of
the response variable (string or numeric).


> SparkR spark.naiveBayes throws error when label is numeric type
> ---------------------------------------------------------------
>
>                 Key: SPARK-15153
>                 URL: https://issues.apache.org/jira/browse/SPARK-15153
>             Project: Spark
>          Issue Type: Bug
>          Components: ML, SparkR
>            Reporter: Yanbo Liang
>
> When the label of dataset is numeric type, SparkR spark.naiveBayes will throw error.
This bug is easy to reproduce:
> {code}
> t <- as.data.frame(Titanic)
> t1 <- t[t$Freq > 0, -5]
> t1$NumericSurvived <- ifelse(t1$Survived == "No", 0, 1)
> t2 <- t1[-4]
> df <- suppressWarnings(createDataFrame(sqlContext, t2))
> m <- spark.naiveBayes(df, NumericSurvived ~ .)
> 16/05/05 03:26:17 ERROR RBackendHandler: fit on org.apache.spark.ml.r.NaiveBayesWrapper
failed
> Error in invokeJava(isStatic = TRUE, className, methodName, ...) :
>   java.lang.ClassCastException: org.apache.spark.ml.attribute.UnresolvedAttribute$ cannot
be cast to org.apache.spark.ml.attribute.NominalAttribute
> 	at org.apache.spark.ml.r.NaiveBayesWrapper$.fit(NaiveBayesWrapper.scala:66)
> 	at org.apache.spark.ml.r.NaiveBayesWrapper.fit(NaiveBayesWrapper.scala)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:498)
> 	at org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:141)
> 	at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:86)
> 	at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:38)
> 	at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
> 	at io.netty.channel.AbstractChannelHandlerContext.invo
> {code}
> In RFormula, the response variable type could be string or numeric. If it's string, RFormula
will transform it to label of DoubleType by StringIndexer and set corresponding column metadata;
otherwise, RFormula will directly use it as label when training model (and assumes that it
was numbered from 0, ..., maxLabelIndex). 
> When we extract labels at ml.r.NaiveBayesWrapper, we should handle it according the type
of the response variable (string or numeric).
> cc [~mengxr] [~josephkb]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message