spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Artemis User <arte...@dtechspace.com>
Subject Re: Unable to write data into hive table using Spark via Hive JDBC driver Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED
Date Mon, 19 Jul 2021 13:32:31 GMT
As Mich mentioned, no need to use jdbc API, using the DataFrameWriter's 
saveAsTable method is the way to go.   JDBC Driver is for a JDBC client 
(a Java client for instance) to access the Hive tables in Spark via the 
Thrift server interface.

-- ND

On 7/19/21 2:42 AM, Badrinath Patchikolla wrote:
> I have trying to create table in hive from spark itself,
>
> And using local mode it will work what I am trying here is from spark 
> standalone I want to create the manage table in hive (another spark 
> cluster basically CDH) using jdbc mode.
>
> When I try that below are the error I am facing.
>
> On Thu, 15 Jul, 2021, 9:55 pm Mich Talebzadeh, 
> <mich.talebzadeh@gmail.com <mailto:mich.talebzadeh@gmail.com>> wrote:
>
>     Have you created that table in Hive or are you trying to create it
>     from Spark itself.
>
>     You Hive is local. In this case you don't need a JDBC connection.
>     Have you tried:
>
>     df2.write.mode("overwrite").saveAsTable(mydb.mytable)
>
>     HTH
>
>
>
>
>     **view my Linkedin profile
>     <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>     *Disclaimer:* Use it at your own risk.Any and all responsibility
>     for any loss, damage or destruction of data or any other property
>     which may arise from relying on this email's technical content is
>     explicitly disclaimed. The author will in no case be liable for
>     any monetary damages arising from such loss, damage or destruction.
>
>
>
>     On Thu, 15 Jul 2021 at 12:51, Badrinath Patchikolla
>     <pbadrinath1996@gmail.com <mailto:pbadrinath1996@gmail.com>> wrote:
>
>         Hi,
>
>         Trying to write data in spark to the hive as JDBC mode below 
>         is the sample code:
>
>         spark standalone 2.4.7 version
>
>         21/07/15 08:04:07 WARN util.NativeCodeLoader: Unable to load
>         native-hadoop library for your platform... using builtin-java
>         classes where applicable
>         Setting default log level to "WARN".
>         To adjust logging level use sc.setLogLevel(newLevel). For
>         SparkR, use setLogLevel(newLevel).
>         Spark context Web UI available at http://localhost:4040
>         <http://localhost:4040>
>         Spark context available as 'sc' (master =
>         spark://localhost:7077, app id = app-20210715080414-0817).
>         Spark session available as 'spark'.
>         Welcome to
>               ____              __
>              / __/__  ___ _____/ /__
>             _\ \/ _ \/ _ `/ __/  '_/
>            /___/ .__/\_,_/_/ /_/\_\   version 2.4.7
>               /_/
>
>         Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java
>         1.8.0_292)
>         Type in expressions to have them evaluated.
>         Type :help for more information.
>
>         scala> :paste
>         // Entering paste mode (ctrl-D to finish)
>
>         val df = Seq(
>             ("John", "Smith", "London"),
>             ("David", "Jones", "India"),
>             ("Michael", "Johnson", "Indonesia"),
>             ("Chris", "Lee", "Brazil"),
>             ("Mike", "Brown", "Russia")
>           ).toDF("first_name", "last_name", "country")
>
>
>          df.write
>           .format("jdbc")
>           .option("url",
>         "jdbc:hive2://localhost:10000/foundation;AuthMech=2;UseNativeQuery=0")
>           .option("dbtable", "test.test")
>           .option("user", "admin")
>           .option("password", "admin")
>           .option("driver", "com.cloudera.hive.jdbc41.HS2Driver")
>           .mode("overwrite")
>           .save
>
>
>         // Exiting paste mode, now interpreting.
>
>         java.sql.SQLException: [Cloudera][HiveJDBCDriver](500051)
>         ERROR processing query/statement. Error Code: 40000, SQL
>         state: TStatus(statusCode:ERROR_STATUS,
>         infoMessages:[*org.apache.hive.service.cli.HiveSQLException:Error
>         while compiling statement: FAILED: ParseException line 1:39
>         cannot recognize input near '"first_name"' 'TEXT' ',' in
>         column name or primary key or foreign key:28:27,
>         org.apache.hive.service.cli.operation.Operation:toSQLException:Operation.java:329,
>         org.apache.hive.service.cli.operation.SQLOperation:prepare:SQLOperation.java:207,
>         org.apache.hive.service.cli.operation.SQLOperation:runInternal:SQLOperation.java:290,
>         org.apache.hive.service.cli.operation.Operation:run:Operation.java:260,
>         org.apache.hive.service.cli.session.HiveSessionImpl:executeStatementInternal:HiveSessionImpl.java:504,
>         org.apache.hive.service.cli.session.HiveSessionImpl:executeStatementAsync:HiveSessionImpl.java:490,
>         sun.reflect.GeneratedMethodAccessor13:invoke::-1,
>         sun.reflect.DelegatingMethodAccessorImpl:invoke:DelegatingMethodAccessorImpl.java:43,
>         java.lang.reflect.Method:invoke:Method.java:498,
>         org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:78,
>         org.apache.hive.service.cli.session.HiveSessionProxy:access$000:HiveSessionProxy.java:36,
>         org.apache.hive.service.cli.session.HiveSessionProxy$1:run:HiveSessionProxy.java:63,
>         java.security.AccessController:doPrivileged:AccessController.java:-2,
>         javax.security.auth.Subject:doAs:Subject.java:422,
>         org.apache.hadoop.security.UserGroupInformation:doAs:UserGroupInformation.java:1875,
>         org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:59,
>         com.sun.proxy.$Proxy35:executeStatementAsync::-1,
>         org.apache.hive.service.cli.CLIService:executeStatementAsync:CLIService.java:295,
>         org.apache.hive.service.cli.thrift.ThriftCLIService:ExecuteStatement:ThriftCLIService.java:507,
>         org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1437,
>         org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1422,
>         org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39,
>         org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39,
>         org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56,
>         org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:286,
>         java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1149,
>         java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:624,
>         java.lang.Thread:run:Thread.java:748,
>         *org.apache.hadoop.hive.ql.parse.ParseException:line 1:39
>         cannot recognize input near '"first_name"' 'TEXT' ',' in
>         column name or primary key or foreign key:33:6,
>         org.apache.hadoop.hive.ql.parse.ParseDriver:parse:ParseDriver.java:221,
>         org.apache.hadoop.hive.ql.parse.ParseUtils:parse:ParseUtils.java:75,
>         org.apache.hadoop.hive.ql.parse.ParseUtils:parse:ParseUtils.java:68,
>         org.apache.hadoop.hive.ql.Driver:compile:Driver.java:564,
>         org.apache.hadoop.hive.ql.Driver:compileInternal:Driver.java:1425,
>         org.apache.hadoop.hive.ql.Driver:compileAndRespond:Driver.java:1398,
>         org.apache.hive.service.cli.operation.SQLOperation:prepare:SQLOperation.java:205],
>         sqlState:42000, errorCode:40000, errorMessage:Error while
>         compiling statement: FAILED: ParseException line 1:39 cannot
>         recognize input near '"first_name"' 'TEXT' ',' in column name
>         or primary key or foreign key), Query: CREATE TABLE
>         test.test("first_name" TEXT , "last_name" TEXT , "country" TEXT ).
>           at
>         com.cloudera.hiveserver2.hivecommon.api.HS2Client.executeStatementInternal(Unknown
>         Source)
>           at
>         com.cloudera.hiveserver2.hivecommon.api.HS2Client.executeStatement(Unknown
>         Source)
>           at
>         com.cloudera.hiveserver2.hivecommon.dataengine.HiveJDBCNativeQueryExecutor.executeHelper(Unknown
>         Source)
>           at
>         com.cloudera.hiveserver2.hivecommon.dataengine.HiveJDBCNativeQueryExecutor.execute(Unknown
>         Source)
>           at
>         com.cloudera.hiveserver2.jdbc.common.SStatement.executeNoParams(Unknown
>         Source)
>           at
>         com.cloudera.hiveserver2.jdbc.common.SStatement.executeAnyUpdate(Unknown
>         Source)
>           at
>         com.cloudera.hiveserver2.jdbc.common.SStatement.executeUpdate(Unknown
>         Source)
>           at
>         org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.createTable(JdbcUtils.scala:863)
>           at
>         org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:81)
>           at
>         org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
>           at
>         org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
>           at
>         org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
>           at
>         org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
>           at
>         org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
>           at
>         org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
>           at
>         org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
>           at
>         org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>           at
>         org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
>           at
>         org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
>           at
>         org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
>           at
>         org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:81)
>           at
>         org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696)
>           at
>         org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696)
>           at
>         org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
>           at
>         org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
>           at
>         org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
>           at
>         org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:696)
>           at
>         org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:305)
>           at
>         org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:291)
>           ... 48 elided
>         Caused by:
>         com.cloudera.hiveserver2.support.exceptions.GeneralException:
>         [Cloudera][HiveJDBCDriver](500051) ERROR processing
>         query/statement. Error Code: 40000, SQL state:
>         TStatus(statusCode:ERROR_STATUS,
>         infoMessages:[*org.apache.hive.service.cli.HiveSQLException:Error
>         while compiling statement: FAILED: ParseException line 1:39
>         cannot recognize input near '"first_name"' 'TEXT' ',' in
>         column name or primary key or foreign key:28:27,
>         org.apache.hive.service.cli.operation.Operation:toSQLException:Operation.java:329,
>         org.apache.hive.service.cli.operation.SQLOperation:prepare:SQLOperation.java:207,
>         org.apache.hive.service.cli.operation.SQLOperation:runInternal:SQLOperation.java:290,
>         org.apache.hive.service.cli.operation.Operation:run:Operation.java:260,
>         org.apache.hive.service.cli.session.HiveSessionImpl:executeStatementInternal:HiveSessionImpl.java:504,
>         org.apache.hive.service.cli.session.HiveSessionImpl:executeStatementAsync:HiveSessionImpl.java:490,
>         sun.reflect.GeneratedMethodAccessor13:invoke::-1,
>         sun.reflect.DelegatingMethodAccessorImpl:invoke:DelegatingMethodAccessorImpl.java:43,
>         java.lang.reflect.Method:invoke:Method.java:498,
>         org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:78,
>         org.apache.hive.service.cli.session.HiveSessionProxy:access$000:HiveSessionProxy.java:36,
>         org.apache.hive.service.cli.session.HiveSessionProxy$1:run:HiveSessionProxy.java:63,
>         java.security.AccessController:doPrivileged:AccessController.java:-2,
>         javax.security.auth.Subject:doAs:Subject.java:422,
>         org.apache.hadoop.security.UserGroupInformation:doAs:UserGroupInformation.java:1875,
>         org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:59,
>         com.sun.proxy.$Proxy35:executeStatementAsync::-1,
>         org.apache.hive.service.cli.CLIService:executeStatementAsync:CLIService.java:295,
>         org.apache.hive.service.cli.thrift.ThriftCLIService:ExecuteStatement:ThriftCLIService.java:507,
>         org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1437,
>         org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1422,
>         org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39,
>         org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39,
>         org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56,
>         org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:286,
>         java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1149,
>         java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:624,
>         java.lang.Thread:run:Thread.java:748,
>         *org.apache.hadoop.hive.ql.parse.ParseException:line 1:39
>         cannot recognize input near '"first_name"' 'TEXT' ',' in
>         column name or primary key or foreign key:33:6,
>         org.apache.hadoop.hive.ql.parse.ParseDriver:parse:ParseDriver.java:221,
>         org.apache.hadoop.hive.ql.parse.ParseUtils:parse:ParseUtils.java:75,
>         org.apache.hadoop.hive.ql.parse.ParseUtils:parse:ParseUtils.java:68,
>         org.apache.hadoop.hive.ql.Driver:compile:Driver.java:564,
>         org.apache.hadoop.hive.ql.Driver:compileInternal:Driver.java:1425,
>         org.apache.hadoop.hive.ql.Driver:compileAndRespond:Driver.java:1398,
>         org.apache.hive.service.cli.operation.SQLOperation:prepare:SQLOperation.java:205],
>         sqlState:42000, errorCode:40000, errorMessage:Error while
>         compiling statement: FAILED: ParseException line 1:39 cannot
>         recognize input near '"first_name"' 'TEXT' ',' in column name
>         or primary key or foreign key), Query: CREATE TABLE
>         profile_test.person_test ("first_name" TEXT , "last_name" TEXT
>         , "country" TEXT ).
>           ... 77 more
>
>
>         Found similar issue ion Jira :
>
>         https://issues.apache.org/jira/browse/SPARK-31614
>         <https://issues.apache.org/jira/browse/SPARK-31614>
>
>         There no comments in that and the resolution is Incomplete, is
>         there any way we can do in spark to write data into the hive
>         as JDBC mode.
>
>         Thanks for any help.
>
>
>         Thanks,
>         Badrinath.
>


Mime
View raw message