spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ethan Aubin <ethan.au...@gmail.com>
Subject Pyspark SQL 1.6.0 write problem
Date Thu, 25 Aug 2016 15:00:28 GMT
Hi, I'm having problems writing dataframes with pyspark 1.6.0.  If I create
a small dataframe like:

    sqlContext.createDataFrame(pandas.DataFrame.from_dict([{'x':
1}])).write.orc('test-orc')

Only the _SUCCESS file in the output directory is written. The executor log
shows the saved output of the task being written under test-orc/_temporary/.

Writing with parquet rather than orc, I have the same output (a _SUCCESS
file, no parts), but there's also an exception

java.lang.NullPointerException
    at org.apache.parquet.hadoop.ParquetFileWriter.mergeFooters(Par
quetFileWriter.java:456)

matching "Writing empty Dataframes doesn't save any _metadata files"
https://issues.apache.org/jira/browse/SPARK-15393

If I do the equivalent in Scala, things work as expected. Any suggestions
what could be happening? Much appreciated --Ethan

Mime
View raw message