spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peyman Mohajerian <mohaj...@gmail.com>
Subject Re: AnalysisException exception while parsing XML
Date Wed, 31 Aug 2016 23:41:48 GMT
Once you get to the 'Array' type, you got to use explode, you cannot to the
same traversing.

On Wed, Aug 31, 2016 at 2:19 PM, <srikanth.jella@gmail.com> wrote:

> Hello Experts,
>
>
>
> I am using Spark XML package to parse the XML. Below exception is being
> thrown when trying to *parse a tag which exist in arrays of array depth*.
> i.e. in this case subordinate_clerk.xxxx .duty.name
>
>
>
> With below sample XML, issue is reproducible:
>
>
>
> <emplist>
>
>   <emp>
>
>    <manager>
>
>     <id>1</id>
>
>     <name>mgr1</name>
>
>     <dateOfJoin>2005-07-31</dateOfJoin>
>
>     <subordinates>
>
>       <subordinate_clerk>
>
>         <cid>2</cid>
>
>         <cname>clerk2</cname>
>
>         <dateOfJoin>2005-07-31</dateOfJoin>
>
>       </subordinate_clerk>
>
>       <subordinate_clerk>
>
>         <cid>3</cid>
>
>         <cname>clerk3</cname>
>
>         <dateOfJoin>2005-07-31</dateOfJoin>
>
>       </subordinate_clerk>
>
>     </subordinates>
>
>    </manager>
>
>   </emp>
>
>   <emp>
>
>    <manager>
>
>    <id>11</id>
>
>    <name>mgr11</name>
>
>     <subordinates>
>
>       <subordinate_clerk>
>
>         <cid>12</cid>
>
>         <cname>clerk12</cname>
>
>         <duties>
>
>          <duty>
>
>            <name>first duty</name>
>
>          </duty>
>
>          <duty>
>
>            <name>second duty</name>
>
>          </duty>
>
>        </duties>
>
>       </subordinate_clerk>
>
>     </subordinates>
>
>    </manager>
>
>   </emp>
>
> </emplist>
>
>
>
>
>
> scala> df.select( "manager.subordinates.subordinate_clerk.duties.duty.name").show
>
>
>
> Exception is:
>
>  org.apache.spark.sql.AnalysisException: cannot resolve 'manager.subordinates.subordinate_clerk.duties.duty[name]'
due to data type mismatch: argument 2 requires integral type, however, 'name' is of string
type.;
>
>        at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>
>        at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:65)
>
>        at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:57)
>
>        at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:335)
>
>        at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:335)
>
>        at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69)
>
>        at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:334)
>
>        at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:332)
>
>        at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:332)
>
>        at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:281)
>
>        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>
>        at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>
>        at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>
>        at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
>
>        at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
>
>        at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
>
>        at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
>
>        at scala.collection.AbstractIterator.to(Iterator.scala:1157)
>
>        at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
>
>        at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
>
>        at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
>
>        at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
>
>        at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:321)
>
>        at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:332)
>
> ... more
>
>
>
>
>
>
>
>
>
> scala> df.printSchema
>
> root
>
>  |-- manager: struct (nullable = true)
>
>  |    |-- dateOfJoin: string (nullable = true)
>
>  |    |-- id: long (nullable = true)
>
>  |    |-- name: string (nullable = true)
>
>  |    |-- subordinates: struct (nullable = true)
>
>  |    |    |-- subordinate_clerk: array (nullable = true)
>
>  |    |    |    |-- element: struct (containsNull = true)
>
>  |    |    |    |    |-- cid: long (nullable = true)
>
>  |    |    |    |    |-- cname: string (nullable = true)
>
>  |    |    |    |    |-- dateOfJoin: string (nullable = true)
>
>  |    |    |    |    |-- duties: struct (nullable = true)
>
>  |    |    |    |    |    |-- duty: array (nullable = true)
>
>  |    |    |    |    |    |    |-- element: struct (containsNull = true)
>
>  |    |    |    |    |    |    |    |-- name: string (nullable = true)
>
>
>
>
>
>
>
> Versions info:
>
> Spark - 1.6.0
>
> Scala - 2.10.5
>
> Spark XML - com.databricks:spark-xml_2.10:0.3.3
>
>
>
> Please let me know if there is a solution or workaround for this?
>
>
>
> Thanks,
>
> Sreekanth
>
>
>

Mime
View raw message