spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Okehee Goh <oke...@gmail.com>
Subject Re: SparkSQL 1.4 can't accept registration of UDF?
Date Fri, 17 Jul 2015 00:43:27 GMT
The same issue (A custome udf jar added through 'add jar' is not
recognized) is observed on Spark 1.4.1.

Instead of executing,
beeline>add jar udf.jar

My workaround is either
1) to pass the udf.jar by using "--jars" while starting ThriftServer
(This didn't work in AWS EMR's Spark 1.4.0.b).
or
2) to add the custom UDF jar into SPARK_CLASSPATH  ( It works in AWS EMR)

Thanks,


On Tue, Jul 14, 2015 at 9:29 PM, Okehee Goh <okehee@gmail.com> wrote:
> The command "list jar" doesn't seem accepted in beeline with Spark's
> ThriftServer in both Spark 1.3.1 and Spark1.4.
>
> 0: jdbc:hive2://localhost:10000> list jar;
>
> Error: org.apache.spark.sql.AnalysisException: cannot recognize input
> near 'list' 'jar' '<EOF>'; line 1 pos 0 (state=,code=0)
>
> Thanks
>
> On Tue, Jul 14, 2015 at 8:46 PM, prosp4300 <prosp4300@163.com> wrote:
>>
>>
>>
>> What's the result of "list jar" in both 1.3.1 and 1.4.0, please check if
>> there is any difference
>>
>>
>>
>> At 2015-07-15 08:10:44, "ogoh" <okehee@gmail.com> wrote:
>>>Hello,
>>>I am using SparkSQL along with ThriftServer so that we can access using
>>> Hive
>>>queries.
>>>With Spark 1.3.1, I can register UDF function. But, Spark 1.4.0 doesn't
>>> work
>>>for that. The jar of the udf is same.
>>>Below is logs:
>>>I appreciate any advice.
>>>
>>>
>>>== With Spark 1.4
>>>Beeline version 1.4.0 by Apache Hive
>>>
>>>0: jdbc:hive2://localhost:10000> add jar
>>>hdfs:///user/hive/lib/dw-udf-2015.06.06-SNAPSHOT.jar;
>>>
>>>0: jdbc:hive2://localhost:10000> create temporary function parse_trace as
>>>'com. mycom.dataengine.udf.GenericUDFParseTraceAnnotation';
>>>
>>>15/07/14 23:49:43 DEBUG transport.TSaslTransport: writing data length: 206
>>>
>>>15/07/14 23:49:43 DEBUG transport.TSaslTransport: CLIENT: reading data
>>>length: 201
>>>
>>>Error: org.apache.spark.sql.execution.QueryExecutionException: FAILED:
>>>Execution Error, return code 1 from
>>>org.apache.hadoop.hive.ql.exec.FunctionTask (state=,code=0)
>>>
>>>
>>>== With Spark 1.3.1:
>>>
>>>Beeline version 1.3.1 by Apache Hive
>>>
>>>0: jdbc:hive2://localhost:10001> add jar
>>>hdfs:///user/hive/lib/dw-udf-2015.06.06-SNAPSHOT.jar;
>>>
>>>+---------+
>>>
>>>| Result  |
>>>
>>>+---------+
>>>
>>>+---------+
>>>
>>>No rows selected (1.313 seconds)
>>>
>>>0: jdbc:hive2://localhost:10001> create temporary function parse_trace as
>>>'com. mycom.dataengine.udf.GenericUDFParseTraceAnnotation';
>>>
>>>+---------+
>>>
>>>| result  |
>>>
>>>+---------+
>>>
>>>+---------+
>>>
>>>No rows selected (0.999 seconds)
>>>
>>>
>>>=== The logs of ThriftServer of Spark 1.4.0
>>>
>>>15/07/14 23:49:43 INFO SparkExecuteStatementOperation: Running query
>>> 'create
>>>temporary function parse_trace as
>>>'com.quixey.dataengine.udf.GenericUDFParseTraceAnnotation''
>>>
>>>15/07/14 23:49:43 INFO ParseDriver: Parsing command: create temporary
>>>function parse_trace as
>>>'com.quixey.dataengine.udf.GenericUDFParseTraceAnnotation'
>>>
>>>15/07/14 23:49:43 INFO ParseDriver: Parse Completed
>>>
>>>15/07/14 23:49:43 INFO PerfLogger: <PERFLOG method=Driver.run
>>>from=org.apache.hadoop.hive.ql.Driver>
>>>
>>>15/07/14 23:49:43 INFO PerfLogger: <PERFLOG method=TimeToSubmit
>>>from=org.apache.hadoop.hive.ql.Driver>
>>>
>>>15/07/14 23:49:43 INFO Driver: Concurrency mode is disabled, not creating a
>>>lock manager
>>>
>>>15/07/14 23:49:43 INFO PerfLogger: <PERFLOG method=compile
>>>from=org.apache.hadoop.hive.ql.Driver>
>>>
>>>15/07/14 23:49:43 INFO PerfLogger: <PERFLOG method=parse
>>>from=org.apache.hadoop.hive.ql.Driver>
>>>
>>>15/07/14 23:49:43 INFO ParseDriver: Parsing command: create temporary
>>>function parse_trace as
>>>'com.quixey.dataengine.udf.GenericUDFParseTraceAnnotation'
>>>
>>>15/07/14 23:49:43 INFO ParseDriver: Parse Completed
>>>
>>>15/07/14 23:49:43 INFO PerfLogger: </PERFLOG method=parse
>>>start=1436917783106 end=1436917783106 duration=0
>>>from=org.apache.hadoop.hive.ql.Driver>
>>>
>>>15/07/14 23:49:43 INFO PerfLogger: <PERFLOG method=semanticAnalyze
>>>from=org.apache.hadoop.hive.ql.Driver>
>>>
>>>15/07/14 23:49:43 INFO HiveMetaStore: 2: get_database: default
>>>
>>>15/07/14 23:49:43 INFO audit: ugi=anonymous     ip=unknown-ip-addr
>>>cmd=get_database: default
>>>
>>>15/07/14 23:49:43 INFO HiveMetaStore: 2: Opening raw store with
>>>implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
>>>
>>>15/07/14 23:49:43 INFO ObjectStore: ObjectStore, initialize called
>>>
>>>15/07/14 23:49:43 INFO MetaStoreDirectSql: MySQL check failed, assuming we
>>>are not on mysql: Lexical error at line 1, column 5.  Encountered: "@"
>>> (64),
>>>after : "".
>>>
>>>15/07/14 23:49:43 INFO Query: Reading in results for query
>>>"org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is
>>>closing
>>>
>>>15/07/14 23:49:43 INFO ObjectStore: Initialized ObjectStore
>>>
>>>15/07/14 23:49:43 INFO FunctionSemanticAnalyzer: analyze done
>>>
>>>15/07/14 23:49:43 INFO Driver: Semantic Analysis Completed
>>>
>>>15/07/14 23:49:43 INFO PerfLogger: </PERFLOG method=semanticAnalyze
>>>start=1436917783106 end=1436917783114 duration=8
>>>from=org.apache.hadoop.hive.ql.Driver>
>>>
>>>15/07/14 23:49:43 INFO Driver: Returning Hive schema:
>>>Schema(fieldSchemas:null, properties:null)
>>>
>>>15/07/14 23:49:43 INFO PerfLogger: </PERFLOG method=compile
>>>start=1436917783106 end=1436917783114 duration=8
>>>from=org.apache.hadoop.hive.ql.Driver>
>>>
>>>15/07/14 23:49:43 INFO PerfLogger: <PERFLOG method=Driver.execute
>>>from=org.apache.hadoop.hive.ql.Driver>
>>>
>>>15/07/14 23:49:43 INFO Driver: Starting command: create temporary function
>>>parse_trace as 'com.quixey.dataengine.udf.GenericUDFParseTraceAnnotation'
>>>
>>>15/07/14 23:49:43 INFO PerfLogger: </PERFLOG method=TimeToSubmit
>>>start=1436917783105 end=1436917783115 duration=10
>>>from=org.apache.hadoop.hive.ql.Driver>
>>>
>>>15/07/14 23:49:43 INFO PerfLogger: <PERFLOG method=runTasks
>>>from=org.apache.hadoop.hive.ql.Driver>
>>>
>>>15/07/14 23:49:43 INFO PerfLogger: <PERFLOG method=task.FUNCTION.Stage-0
>>>from=org.apache.hadoop.hive.ql.Driver>
>>>
>>>15/07/14 23:49:43 ERROR Task: FAILED: Class
>>>com.quixey.dataengine.udf.GenericUDFParseTraceAnnotation not found
>>>
>>>15/07/14 23:49:43 INFO FunctionTask: create function:
>>>java.lang.ClassNotFoundException:
>>>com.quixey.dataengine.udf.GenericUDFParseTraceAnnotation
>>>
>>>        at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
>>>
>>>        at java.net.URLClassLoader$1.run(
>> URLClassLoader.java:361)
>>>
>>>
>>>
>>>--
>>>View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-1-4-can-t-accept-registration-of-UDF-tp23840.html
>>>Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>For additional commands, e-mail: user-help@spark.apache.org
>>>
>>
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message