spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From matthes <mdiekst...@sensenetworks.com>
Subject Can't access nested types with sql
Date Fri, 23 Jan 2015 15:02:03 GMT
I try to work with nested parquet data. To read and write the parquet file is
actually working now but when I try to query a nested field with SqlContext
I get an exception:

RuntimeException: "Can't access nested field in type
ArrayType(StructType(List(StructField(..."

I generate the parquet file by parsing the data into the following caseclass
structure:

case class areas(area : String, dates : Seq[Int])
case class dataset(userid : Long, source : Int, days : Seq[Int] , areas :
Seq[areas] )

automatic generated schema:
root
 |-- userid: long (nullable = false)
 |-- source: integer (nullable = false)
 |-- days: array (nullable = true)
 |    |-- element: integer (containsNull = false)
 |-- areas: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- area: string (nullable = true)
 |    |    |-- dates: array (nullable = true)
 |    |    |    |-- element: integer (containsNull = false)
 
After writeing the Parquetfile I load the data again and I create a
SQLContext and try to execute a sql-command like follows:

parquetFile.registerTempTable("testtable")
val result = sqlContext.sql("SELECT areas.area FROM testtable where userid >
500000")   
result.map(t => t(0)).collect().foreach(println) // throw the exception 

If I execute this command: val result = sqlContext.sql("SELECT areas[0].area
FROM testtable where userid > 500000")  
I get only the values at the first position in the array but I need every
value and that doesn't work.
I sow the function t.getAs[...] but everything what I tried didn't worked. 

I hope somebody can help me how I can access a nested field that I read all
values of the nested array or isn't it supported?

I use spark-sql_2.10(v1.2.0), spark-core_2.10(v1.2.0) and parquet 1.6.0rc4.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Can-t-access-nested-types-with-sql-tp21336.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message