spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Armbrust <mich...@databricks.com>
Subject Re: Spark SQL Percentile UDAF
Date Fri, 10 Oct 2014 02:13:01 GMT
Please file a JIRA:https://issues.apache.org/jira/browse/SPARK/
<https://www.google.com/url?q=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FSPARK%2F&sa=D&sntz=1&usg=AFQjCNFS_GnMso2OCOITA0TSJ5U10b3JSQ>

On Thu, Oct 9, 2014 at 6:48 PM, Anand Mohan <chinnitv@gmail.com> wrote:

> Hi,
>
> I just noticed the Percentile UDAF PR being merged into trunk and decided
> to test it.
> So pulled in today's trunk and tested the percentile queries.
> They work marvelously, Thanks a lot for bringing this into Spark SQL.
>
> However Hive percentile UDAF also supports an array mode where in you can
> give the list of percentiles that you want and it would return an array of
> double values one for each requested percentile.
> This query is failing with the below error. However a query with the
> individual percentiles like
> percentile(turnaroundtime,0.25),percentile(turnaroundtime,0.5),percentile(turnaroundtime,0.75)
> is working. (and so this issue is not of a high priority as there is this
> workaround for us)
>
> Thanks,
> Anand Mohan
>
> 0: jdbc:hive2://dev-uuppala.sfohi.philips.com> select name,
> percentile(turnaroundtime,array(0,0.25,0.5,0.75,1)) from exam group by name;
>
> Error: org.apache.spark.SparkException: Job aborted due to stage failure:
> Task 1 in stage 25.0 failed 4 times, most recent failure: Lost task 1.3 in
> stage 25.0 (TID 305, Dev-uuppala.sfohi.philips.com):
> java.lang.ClassCastException: scala.collection.mutable.ArrayBuffer cannot
> be cast to [Ljava.lang.Object;
>
> org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector.getListLength(StandardListObjectInspector.java:83)
>
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$ListConverter.convert(ObjectInspectorConverters.java:259)
>
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFUtils$ConversionHelper.convertIfNecessary(GenericUDFUtils.java:349)
>
> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge$GenericUDAFBridgeEvaluator.iterate(GenericUDAFBridge.java:170)
>
> org.apache.spark.sql.hive.HiveUdafFunction.update(hiveUdfs.scala:342)
>
> org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$7.apply(Aggregate.scala:167)
>
> org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$7.apply(Aggregate.scala:151)
>         org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:599)
>         org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:599)
>
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>         org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>         org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>         org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
>         org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>         org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>         org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
>         org.apache.spark.scheduler.Task.run(Task.scala:56)
>
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         java.lang.Thread.run(Thread.java:745)
> Driver stacktrace: (state=,code=0)
>
>
>
> ------------------------------
> View this message in context: Spark SQL Percentile UDAF
> <http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-Percentile-UDAF-tp16092.html>
> Sent from the Apache Spark User List mailing list archive
> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com.
>

Mime
View raw message