spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From gtinside <gtins...@gmail.com>
Subject Integer column in schema RDD from parquet being considered as string
Date Wed, 04 Mar 2015 20:34:18 GMT
Hi ,

I am coverting jsonRDD to parquet by saving it as parquet file
(saveAsParquetFile)
cacheContext.jsonFile("file:///u1/sample.json").saveAsParquetFile("sample.parquet")

I am reading parquet file and registering it as a table :
val parquet = cacheContext.parquetFile("sample_trades.parquet")
parquet.registerTempTable("sample")

When I do a print schema , I see :
root
 |-- SAMPLE: struct (nullable = true)
 |    |-- CODE: integer (nullable = true)
 |    |-- DESC: string (nullable = true)

When I query :
cacheContext.sql("select SAMPLE.DESC from sample where
SAMPLE.CODE=1").map(t=>t).collect.foreach(println) , I get error that 
java.lang.IllegalArgumentException: Column [CODE] was not found in schema!

but if I put SAMPLE.CODE in single code (forcing it as string) , it works ,
for example :
cacheContext.sql("select SAMPLE.DESC from sample where
*SAMPLE.CODE='1'*").map(t=>t).collect.foreach(println) works

What am I missing here ? I understand catalyst will do optimization so data
type doesn't matter that much , but something is off here .

Regards,
Gaurav









--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Integer-column-in-schema-RDD-from-parquet-being-considered-as-string-tp21917.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message