spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "fightfate@163.com" <fightf...@163.com>
Subject Re: saveAsTable fails to save RDD in Spark SQL 1.3.0
Date Wed, 18 Mar 2015 01:33:04 GMT
Looks like some authentification issues. Can you check that your current user 
had authority to operate (maybe r/w/x) on /user/hive/warehouse?

Thanks,
Sun.



fightfate@163.com
 
From: smoradi
Date: 2015-03-18 09:24
To: user
Subject: saveAsTable fails to save RDD in Spark SQL 1.3.0
Hi,
Basically my goal is to make the Spark SQL RDDs available to Tableau
software through Simba ODBC driver.
I’m running standalone Spark 1.3.0 on Ubuntu 14.04. Got the source code and
complied it with maven.
Hive is also setup and connected to mysql all on a the same machine. The
hive-site.xml file has been copied to spark/conf. Here is the content of the
hive-site.xml:
 
<configuration>
      <property>
            <name>javax.jdo.option.ConnectionURL</name>
           
<value>jdbc:MySql://localhost:3306/metastore_db?createDatabaseIfNotExist=true</value>
            <description>metadata is stored in a MySQL server</description>
      </property>
      <property>
            <name>hive.metastore.schema.verification</name>
            <value>false</value>
      </property>
      <property>
            <name>javax.jdo.option.ConnectionDriverName</name>
            <value>com.mysql.jdbc.Driver</value>
            <description>MySQL JDBC driver class</description>
      </property>
      <property>
            <name>javax.jdo.option.ConnectionUserName</name>
            <value>hiveuser</value>
            <description>user name for connecting to mysql server
</description>
      </property>
      <property>
            <name>javax.jdo.option.ConnectionPassword</name>
            <value>hivepassword</value>
            <description>password for connecting to mysql server
</description>
      </property>
</configuration>
 
Both hive and mysql work just fine. I can create a table with Hive and find
it in mysql.
The thriftserver is also configured and connected to the spark master.
Everything works just fine and I can monitor all the workers and running
applications through spark master UI.
I have a very simple python script to convert a json file to an RDD like
this:
 
import json
 
def transform(data):
    ts  = data[:25].strip()
    jss = data[41:].strip()
    jsj = json.loads(jss)
    jsj['ts'] = ts
    return json.dumps(jsj)
 
from pyspark.sql import HiveContext
sqlContext = HiveContext(sc)
rdd  = sc.textFile("myfile")
tbl = sqlContext.jsonRDD(rdd.map(transform))
tbl.saveAsTable("neworder")
 
the saveAsTable fails with this:
15/03/17 17:22:17 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks
have all completed, from pool 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/spark/python/pyspark/sql/dataframe.py", line 191, in
saveAsTable
    self._jdf.saveAsTable(tableName, source, jmode, joptions)
  File "/opt/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
line 538, in __call__
  File "/opt/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line
300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling
o31.saveAsTable.
: java.io.IOException: Failed to rename
DeprecatedRawLocalFileStatus{path=file:/user/hive/warehouse/neworder/_temporary/0/task_201503171618_0008_r_000001/part-r-00002.parquet;
isDirectory=false; length=5591; replication=1; blocksize=33554432;
modification_time=1426634300000; access_time=0; owner=; group=;
permission=rw-rw-rw-; isSymlink=false} to
file:/user/hive/warehouse/neworder/part-r-00002.parquet
                at
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:346)
                at
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:362)
                at
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:310)
                at
parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:43)
                at
org.apache.spark.sql.parquet.ParquetRelation2.insert(newParquet.scala:649)
                at
org.apache.spark.sql.parquet.DefaultSource.createRelation(newParquet.scala:126)
                at
org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:308)
                at
org.apache.spark.sql.hive.execution.CreateMetastoreDataSourceAsSelect.run(commands.scala:217)
                at
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:55)
                at
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:55)
                at
org.apache.spark.sql.execution.ExecutedCommand.execute(commands.scala:65)
                at
org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:1088)
                at
org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:1088)
                at
org.apache.spark.sql.DataFrame.saveAsTable(DataFrame.scala:1048)
                at
org.apache.spark.sql.DataFrame.saveAsTable(DataFrame.scala:1018)
                at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
Method)
                at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
                at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                at java.lang.reflect.Method.invoke(Method.java:606)
                at
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
                at
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
                at py4j.Gateway.invoke(Gateway.java:259)
                at
py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
                at py4j.commands.CallCommand.execute(CallCommand.java:79)
                at py4j.GatewayConnection.run(GatewayConnection.java:207)
                at java.lang.Thread.run(Thread.java:745)
 
 
/user/hive/warehouse is a hadoop hdfs location. The source file also is on
hdfs.
 
Any help is appreciated.
 
 
 
 
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/saveAsTable-fails-to-save-RDD-in-Spark-SQL-1-3-0-tp22108.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
 
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org
 
Mime
View raw message