spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From gtinside <>
Subject Integer column in schema RDD from parquet being considered as string
Date Wed, 04 Mar 2015 20:34:18 GMT
Hi ,

I am coverting jsonRDD to parquet by saving it as parquet file

I am reading parquet file and registering it as a table :
val parquet = cacheContext.parquetFile("sample_trades.parquet")

When I do a print schema , I see :
 |-- SAMPLE: struct (nullable = true)
 |    |-- CODE: integer (nullable = true)
 |    |-- DESC: string (nullable = true)

When I query :
cacheContext.sql("select SAMPLE.DESC from sample where
SAMPLE.CODE=1").map(t=>t).collect.foreach(println) , I get error that 
java.lang.IllegalArgumentException: Column [CODE] was not found in schema!

but if I put SAMPLE.CODE in single code (forcing it as string) , it works ,
for example :
cacheContext.sql("select SAMPLE.DESC from sample where
*SAMPLE.CODE='1'*").map(t=>t).collect.foreach(println) works

What am I missing here ? I understand catalyst will do optimization so data
type doesn't matter that much , but something is off here .


View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message