spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Addanki, Santosh Kumar" <santosh.kumar.adda...@sap.com>
Subject saveAsParquetFile and DirectFileOutputCommitter Class not found Error
Date Sun, 07 Dec 2014 19:28:25 GMT
Hi,

When we try to call saveAsParquetFile on a schemaRDD we get the following error :


Py4JJavaError: An error occurred while calling o384.saveAsParquetFile.
: java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/lib/output/DirectFileOutputCommitter
        at org.apache.spark.sql.parquet.InsertIntoParquetTable.execute(ParquetTableOperations.scala:240)
        at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:425)
        at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:425)
        at org.apache.spark.sql.SchemaRDDLike$class.saveAsParquetFile(SchemaRDDLike.scala:76)
        at org.apache.spark.sql.api.java.JavaSchemaRDD.saveAsParquetFile(JavaSchemaRDD.scala:42)



https://issues.apache.org/jira/browse/SPARK-3595 seems to have addressed this issue of respecting
the OutputCommitter but when I pull from the master and try the same I still encounter this
issue.

I am on a Mapr Distribution and my org\apache\hadoop\mapreduce\lib\output does not contain
DirectFileOutputCommitter

Best Regards,
Santosh

Mime
View raw message