spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhang Mengqi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-16064) Fix the GLM error caused by NA produced by reweight function
Date Mon, 04 Jul 2016 08:00:23 GMT

    [ https://issues.apache.org/jira/browse/SPARK-16064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15360987#comment-15360987
] 

Zhang Mengqi commented on SPARK-16064:
--------------------------------------

Thank you very much!


> Fix the GLM error caused by NA produced by reweight function
> ------------------------------------------------------------
>
>                 Key: SPARK-16064
>                 URL: https://issues.apache.org/jira/browse/SPARK-16064
>             Project: Spark
>          Issue Type: Bug
>          Components: ML
>    Affects Versions: 2.0.0
>            Reporter: Zhang Mengqi
>            Assignee: Yanbo Liang
>            Priority: Minor
>
> This case happens when users run GLM in with SparkR, the same dataset runs GLM well in
native R.
> When users run the GLM model using glm with family of poisson, it generates a assertion
errors by NA produced by reweight function.
> 16/06/20 16:40:22 ERROR RBackendHandler: fit on org.apache.spark.ml.r.GeneralizedLinearRegressionWrapper
failed
> Error in invokeJava(isStatic = TRUE, className, methodName, ...) : 
>   java.lang.AssertionError: assertion failed: Sum of weights cannot be zero.
> 	at scala.Predef$.assert(Predef.scala:170)
> 	at org.apache.spark.ml.optim.WeightedLeastSquares$Aggregator.validate(WeightedLeastSquares.scala:248)
> 	at org.apache.spark.ml.optim.WeightedLeastSquares.fit(WeightedLeastSquares.scala:82)
> 	at org.apache.spark.ml.optim.IterativelyReweightedLeastSquares.fit(IterativelyReweightedLeastSquares.scala:85)
> 	at org.apache.spark.ml.regression.GeneralizedLinearRegression.train(GeneralizedLinearRegression.scala:276)
> 	at org.apache.spark.ml.regression.GeneralizedLinearRegression.train(GeneralizedLinearRegression.scala:134)
> 	at org.apache.spark.ml.Predictor.fit(Predictor.scala:90)
> 	at org.apache.spark.ml.Predictor.fit(Predictor.scala:71)
> 	at org.apache.spark.ml.Pipeline$$anonfun$fit$2.apply(Pipeline.scala:148)
> 	at org.apache.spark.ml.Pipeline$$anonfun$fit$2.apply(Pipeline.scala:144)
> 	at scala.collection.Iterator$class.foreach(Iterator.scala:893)
> 	at scala.collection.Abstra
> P.S The dataset is about a city ride flow between several planning area in Singapore.
> ride_flow_exp <- glm(flow~Origin+Destination+distance,ride_flow,family = poisson(link
= "log"))
> SparkDataFrame[Origin:string, Destination:string, flow:double, Oi:int, Dj:int, distance:double]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message