spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From matthes <>
Subject Can't access nested types with sql
Date Fri, 23 Jan 2015 15:02:03 GMT
I try to work with nested parquet data. To read and write the parquet file is
actually working now but when I try to query a nested field with SqlContext
I get an exception:

RuntimeException: "Can't access nested field in type

I generate the parquet file by parsing the data into the following caseclass

case class areas(area : String, dates : Seq[Int])
case class dataset(userid : Long, source : Int, days : Seq[Int] , areas :
Seq[areas] )

automatic generated schema:
 |-- userid: long (nullable = false)
 |-- source: integer (nullable = false)
 |-- days: array (nullable = true)
 |    |-- element: integer (containsNull = false)
 |-- areas: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- area: string (nullable = true)
 |    |    |-- dates: array (nullable = true)
 |    |    |    |-- element: integer (containsNull = false)
After writeing the Parquetfile I load the data again and I create a
SQLContext and try to execute a sql-command like follows:

val result = sqlContext.sql("SELECT areas.area FROM testtable where userid >
500000") => t(0)).collect().foreach(println) // throw the exception 

If I execute this command: val result = sqlContext.sql("SELECT areas[0].area
FROM testtable where userid > 500000")  
I get only the values at the first position in the array but I need every
value and that doesn't work.
I sow the function t.getAs[...] but everything what I tried didn't worked. 

I hope somebody can help me how I can access a nested field that I read all
values of the nested array or isn't it supported?

I use spark-sql_2.10(v1.2.0), spark-core_2.10(v1.2.0) and parquet 1.6.0rc4.

View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message