spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Armbrust <mich...@databricks.com>
Subject Re: Can't access nested types with sql
Date Sat, 24 Jan 2015 19:39:17 GMT
You need to use lateral view explode:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView

On Fri, Jan 23, 2015 at 7:02 AM, matthes <mdiekstall@sensenetworks.com>
wrote:

> I try to work with nested parquet data. To read and write the parquet file
> is
> actually working now but when I try to query a nested field with SqlContext
> I get an exception:
>
> RuntimeException: "Can't access nested field in type
> ArrayType(StructType(List(StructField(..."
>
> I generate the parquet file by parsing the data into the following
> caseclass
> structure:
>
> case class areas(area : String, dates : Seq[Int])
> case class dataset(userid : Long, source : Int, days : Seq[Int] , areas :
> Seq[areas] )
>
> automatic generated schema:
> root
>  |-- userid: long (nullable = false)
>  |-- source: integer (nullable = false)
>  |-- days: array (nullable = true)
>  |    |-- element: integer (containsNull = false)
>  |-- areas: array (nullable = true)
>  |    |-- element: struct (containsNull = true)
>  |    |    |-- area: string (nullable = true)
>  |    |    |-- dates: array (nullable = true)
>  |    |    |    |-- element: integer (containsNull = false)
>
> After writeing the Parquetfile I load the data again and I create a
> SQLContext and try to execute a sql-command like follows:
>
> parquetFile.registerTempTable("testtable")
> val result = sqlContext.sql("SELECT areas.area FROM testtable where userid
> >
> 500000")
> result.map(t => t(0)).collect().foreach(println) // throw the exception
>
> If I execute this command: val result = sqlContext.sql("SELECT
> areas[0].area
> FROM testtable where userid > 500000")
> I get only the values at the first position in the array but I need every
> value and that doesn't work.
> I sow the function t.getAs[...] but everything what I tried didn't worked.
>
> I hope somebody can help me how I can access a nested field that I read all
> values of the nested array or isn't it supported?
>
> I use spark-sql_2.10(v1.2.0), spark-core_2.10(v1.2.0) and parquet 1.6.0rc4.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Can-t-access-nested-types-with-sql-tp21336.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Mime
View raw message