spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Figliozzi <pete.figlio...@gmail.com>
Subject Re: Parsing XML
Date Tue, 04 Oct 2016 23:17:00 GMT
It's pretty clear that df.col(xpath) is looking for a column named xpath in
your df, not executing an xpath over an XML document as you wish.  Try
constructing a UDF which applies your xpath query, and give that as the
second argument to withColumn.

On Tue, Oct 4, 2016 at 4:35 PM, Jean Georges Perrin <jgp@jgp.net> wrote:

> Spark 2.0.0
> XML parser 0.4.0
> Java
>
> Hi,
>
> I am trying to create a new column in my data frame, based on a value of a
> sub element. I have done that several time with JSON, but not very
> successful in XML.
>
> (I know a world with less format would be easier :) )
>
> Here is the code:
> df.withColumn("FulfillmentOption1", df.col("//FulfillmentOption[1]
> /text()"));
>
> And here is the error:
> Exception in thread "main" org.apache.spark.sql.AnalysisException: Cannot
> resolve column name "//FulfillmentOption[1]/text()" among (x, xx, xxx,
> xxxx, a, b, FulfillmentOption, c, d, e, f, g);
>     at org.apache.spark.sql.Dataset$$anonfun$resolve$1.apply(
> Dataset.scala:220)
>     at org.apache.spark.sql.Dataset$$anonfun$resolve$1.apply(
> Dataset.scala:220)
>     ...
>
> The XPath is valid...
>
> Thanks!
>
> jg
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>

Mime
View raw message