spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Armbrust <mich...@databricks.com>
Subject Re: [SparkSQL] Function parity with Shark?
Date Fri, 03 Oct 2014 19:06:55 GMT
Thanks for digging in!  These both look like they should have JIRAs.

On Fri, Oct 3, 2014 at 8:14 AM, Yana Kadiyska <yana.kadiyska@gmail.com>
wrote:

> Thanks -- it does appear that I misdiagnosed a bit: case works generally
> but it doesn't seem to like the bit operation, which does not seem to work
> (type of bit_field in Hive is bigint):
>
> Error: java.lang.RuntimeException:
> Unsupported language features in query: select (case when bit_field & 1=1 then r_end
- r_start else NULL end) from mytable where pkey='0178-2014-07' LIMIT 2
> TOK_QUERY
>   TOK_FROM
>     TOK_TABREF
>       TOK_TABNAME
>        mytable
>   TOK_INSERT
>     TOK_DESTINATION
>       TOK_DIR
>         TOK_TMP_FILE
>     TOK_SELECT
>       TOK_SELEXPR
>         TOK_FUNCTION
>           when
>           =
>             &
>               TOK_TABLE_OR_COL
>                 bit_field
>               1
>             1
>           -
>             TOK_TABLE_OR_COL
>               r_end
>             TOK_TABLE_OR_COL
>               r_start
>           TOK_NULL
>     TOK_WHERE
>       =
>         TOK_TABLE_OR_COL
>           pkey
>         '0178-2014-07'
>     TOK_LIMIT
>       2
>
>
> SQLState:  null
> ErrorCode: 0
>
> ​
>
> similarly, concat seems to work but I get a failure in this query (due to
> LPAD I believe) :
>
> select customer_id from mytable where
> pkey=concat_ws('-',LPAD('077',4,'0'),'2014-07') LIMIT 2
>
> (there is something going on with the fact that the function is in the
> where clause....the following work fine:
>
> select concat_ws('-', LPAD(cast(112717 % 1024 AS STRING),4,'0'),'2014-07')
> from mytable where pkey='0077-2014-07' LIMIT 2 select customer_id from
> mytable where pkey=concat_ws('-','0077','2014-07') LIMIT 2
> ​
> )
>
> 14/10/03 14:51:35 ERROR server.SparkSQLOperationManager: Error executing query:
> org.apache.spark.SparkException: Task not serializable
>         at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166)
>         at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158)
>         at org.apache.spark.SparkContext.clean(SparkContext.scala:1242)
>         at org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:597)
>         at org.apache.spark.sql.execution.Limit.execute(basicOperators.scala:146)
>         at org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd$lzycompute(HiveContext.scala:360)
>         at org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd(HiveContext.scala:360)
>         at org.apache.spark.sql.hive.thriftserver.server.SparkSQLOperationManager$$anon$1.run(SparkSQLOperationManager.scala:185)
>         at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:193)
>         at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatement(HiveSessionImpl.java:175)
>         at org.apache.hive.service.cli.CLIService.executeStatement(CLIService.java:150)
>         at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:207)
>         at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1133)
>         at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1118)
>         at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>         at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>         at org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIContainingProcessor.java:58)
>         at org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIContainingProcessor.java:55)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>         at org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:526)
>         at org.apache.hive.service.auth.TUGIContainingProcessor.process(TUGIContainingProcessor.java:55)
>         at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:744)
> Caused by: java.io.NotSerializableException: java.lang.reflect.Constructor
>         at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183)
>         at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
>         at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
>         at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
>         at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
>         at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1377)
>         at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1173)
>         at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
>         at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
>         at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
>         at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
>         at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
>         at scala.collection.immutable.$colon$colon.writeObject(List.scala:379)
>         at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
> ​
>
>
> Let me know if any of these warrant a JIRA
>
> thanks
>
>
>
>
> On Thu, Oct 2, 2014 at 2:00 PM, Michael Armbrust <michael@databricks.com>
> wrote:
>
>> What are the errors you are seeing.  All of those functions should work.
>>
>> On Thu, Oct 2, 2014 at 6:56 AM, Yana Kadiyska <yana.kadiyska@gmail.com>
>> wrote:
>>
>>> Hi, in an effort to migrate off of Shark I recently tried the Thrift
>>> JDBC server that comes with Spark 1.1.0.
>>>
>>> However I observed that conditional functions do not work (I tried
>>> 'case' and 'coalesce')
>>>
>>> some string functions like 'concat' also did not work.
>>>
>>> Is there a list of what's missing or a roadmap of when it will be added?
>>> (I know percentiles are pending, for example but do not see JIRAs for the
>>> others in this email).
>>>
>>
>>
>

Mime
View raw message