spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chang Ya-Hsuan <sumti...@gmail.com>
Subject Failed to generate predicate Error when using dropna
Date Tue, 08 Dec 2015 09:25:53 GMT
spark version: spark-1.5.2-bin-hadoop2.6
python version: 2.7.9
os: ubuntu 14.04

code to reproduce error

# write.py

import pyspark
sc = pyspark.SparkContext()
sqlc = pyspark.SQLContext(sc)
df = sqlc.range(10)
df1 = df.withColumn('a', df['id'] * 2)
df1.write.partitionBy('id').parquet('./data')


# read.py

import pyspark
sc = pyspark.SparkContext()
sqlc = pyspark.SQLContext(sc)
df2 = sqlc.read.parquet('./data')
df2.dropna().count()


$ spark-submit write.py
$ spark-submit read.py

# error message

15/12/08 17:20:34 ERROR Filter: Failed to generate predicate, fallback to
interpreted org.apache.spark.sql.catalyst.errors.package$TreeNodeException:
Binding attribute, tree: a#0L
...

If write data without partitionBy, the error won't happen
any suggestion?
Thanks!

-- 
-- 張雅軒

Mime
View raw message