spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From VG <vlin...@gmail.com>
Subject Re: spark-xml - xml parsing when rows only have attributes
Date Fri, 17 Jun 2016 13:11:05 GMT
Great..  thanks for pointing this out.



On Fri, Jun 17, 2016 at 6:21 PM, Ted Yu <yuzhihong@gmail.com> wrote:

> Please see https://github.com/databricks/spark-xml/issues/92
>
> On Fri, Jun 17, 2016 at 5:19 AM, VG <vlinked@gmail.com> wrote:
>
>> I am using spark-xml for loading data and creating a data frame.
>>
>> If xml element has sub elements and values, then it works fine. Example
>>  if the xml element is like
>>
>> <a val="1">
>>      <b>test</b>
>> </a>
>>
>> however if the xml element is bare with just attributes, then it does not
>> work - Any suggestions.
>> <a val="1" />  Does not load the data
>>
>>
>>
>> Any suggestions to fix this
>>
>>
>>
>>
>>
>>
>> On Fri, Jun 17, 2016 at 4:28 PM, Siva A <siva9940261121@gmail.com> wrote:
>>
>>> Use Spark XML version,0.3.3
>>> <dependency>
>>> <groupId>com.databricks</groupId>
>>> <artifactId>spark-xml_2.10</artifactId>
>>> <version>0.3.3</version>
>>> </dependency>
>>>
>>> On Fri, Jun 17, 2016 at 4:25 PM, VG <vlinked@gmail.com> wrote:
>>>
>>>> Hi Siva
>>>>
>>>> This is what i have for jars. Did you manage to run with these or
>>>> different versions ?
>>>>
>>>>
>>>> <dependency>
>>>> <groupId>org.apache.spark</groupId>
>>>> <artifactId>spark-core_2.10</artifactId>
>>>> <version>1.6.1</version>
>>>> </dependency>
>>>> <dependency>
>>>> <groupId>org.apache.spark</groupId>
>>>> <artifactId>spark-sql_2.10</artifactId>
>>>> <version>1.6.1</version>
>>>> </dependency>
>>>> <dependency>
>>>> <groupId>com.databricks</groupId>
>>>> <artifactId>spark-xml_2.10</artifactId>
>>>> <version>0.2.0</version>
>>>> </dependency>
>>>> <dependency>
>>>> <groupId>org.scala-lang</groupId>
>>>> <artifactId>scala-library</artifactId>
>>>> <version>2.10.6</version>
>>>> </dependency>
>>>>
>>>> Thanks
>>>> VG
>>>>
>>>>
>>>> On Fri, Jun 17, 2016 at 4:16 PM, Siva A <siva9940261121@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Marco,
>>>>>
>>>>> I did run in IDE(Intellij) as well. It works fine.
>>>>> VG, make sure the right jar is in classpath.
>>>>>
>>>>> --Siva
>>>>>
>>>>> On Fri, Jun 17, 2016 at 4:11 PM, Marco Mistroni <mmistroni@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> and  your eclipse path is correct?
>>>>>> i suggest, as Siva did before, to build your jar and run it via
>>>>>> spark-submit  by specifying the --packages option
>>>>>> it's as simple as run this command
>>>>>>
>>>>>> spark-submit   --packages
>>>>>> com.databricks:spark-xml_<scalaversion>:<packageversion>
  --class <Name of
>>>>>> your class containing main> <path to your jar>
>>>>>>
>>>>>> Indeed, if you have only these lines to run, why dont you try them
in
>>>>>> spark-shell ?
>>>>>>
>>>>>> hth
>>>>>>
>>>>>> On Fri, Jun 17, 2016 at 11:32 AM, VG <vlinked@gmail.com> wrote:
>>>>>>
>>>>>>> nopes. eclipse.
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jun 17, 2016 at 3:58 PM, Siva A <siva9940261121@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> If you are running from IDE, Are you using Intellij?
>>>>>>>>
>>>>>>>> On Fri, Jun 17, 2016 at 3:20 PM, Siva A <siva9940261121@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Can you try to package as a jar and run using spark-submit
>>>>>>>>>
>>>>>>>>> Siva
>>>>>>>>>
>>>>>>>>> On Fri, Jun 17, 2016 at 3:17 PM, VG <vlinked@gmail.com>
wrote:
>>>>>>>>>
>>>>>>>>>> I am trying to run from IDE and everything else is
working fine.
>>>>>>>>>> I added spark-xml jar and now I ended up into this
dependency
>>>>>>>>>>
>>>>>>>>>> 6/06/17 15:15:57 INFO BlockManagerMaster: Registered
BlockManager
>>>>>>>>>> Exception in thread "main" *java.lang.NoClassDefFoundError:
>>>>>>>>>> scala/collection/GenTraversableOnce$class*
>>>>>>>>>> at
>>>>>>>>>> org.apache.spark.sql.execution.datasources.CaseInsensitiveMap.<init>(ddl.scala:150)
>>>>>>>>>> at
>>>>>>>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:154)
>>>>>>>>>> at
>>>>>>>>>> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
>>>>>>>>>> at
>>>>>>>>>> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109)
>>>>>>>>>> at org.ariba.spark.PostsProcessing.main(PostsProcessing.java:19)
>>>>>>>>>> Caused by:* java.lang.ClassNotFoundException:
>>>>>>>>>> scala.collection.GenTraversableOnce$class*
>>>>>>>>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>>>>>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>>>>>>>>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>>>>>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>>>>>>>>> ... 5 more
>>>>>>>>>> 16/06/17 15:15:58 INFO SparkContext: Invoking stop()
from
>>>>>>>>>> shutdown hook
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Jun 17, 2016 at 2:59 PM, Marco Mistroni <
>>>>>>>>>> mmistroni@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> So you are using spark-submit  or spark-shell?
>>>>>>>>>>>
>>>>>>>>>>> you will need to launch either by passing --packages
option
>>>>>>>>>>> (like in the example below for spark-csv). you
will need to iknow
>>>>>>>>>>>
>>>>>>>>>>> --packages com.databricks:spark-xml_<scala.version>:<package
>>>>>>>>>>> version>
>>>>>>>>>>>
>>>>>>>>>>> hth
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Jun 17, 2016 at 10:20 AM, VG <vlinked@gmail.com>
wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Apologies for that.
>>>>>>>>>>>> I am trying to use spark-xml to load data
of a xml file.
>>>>>>>>>>>>
>>>>>>>>>>>> here is the exception
>>>>>>>>>>>>
>>>>>>>>>>>> 16/06/17 14:49:04 INFO BlockManagerMaster:
Registered
>>>>>>>>>>>> BlockManager
>>>>>>>>>>>> Exception in thread "main" java.lang.ClassNotFoundException:
>>>>>>>>>>>> Failed to find data source: org.apache.spark.xml.
Please find packages at
>>>>>>>>>>>> http://spark-packages.org
>>>>>>>>>>>> at
>>>>>>>>>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:77)
>>>>>>>>>>>> at
>>>>>>>>>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:102)
>>>>>>>>>>>> at
>>>>>>>>>>>> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
>>>>>>>>>>>> at
>>>>>>>>>>>> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109)
>>>>>>>>>>>> at org.ariba.spark.PostsProcessing.main(PostsProcessing.java:19)
>>>>>>>>>>>> Caused by: java.lang.ClassNotFoundException:
>>>>>>>>>>>> org.apache.spark.xml.DefaultSource
>>>>>>>>>>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>>>>>>>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>>>>>>>>>>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>>>>>>>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>>>>>>>>>>> at
>>>>>>>>>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:62)
>>>>>>>>>>>> at
>>>>>>>>>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:62)
>>>>>>>>>>>> at scala.util.Try$.apply(Try.scala:192)
>>>>>>>>>>>> at
>>>>>>>>>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4.apply(ResolvedDataSource.scala:62)
>>>>>>>>>>>> at
>>>>>>>>>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4.apply(ResolvedDataSource.scala:62)
>>>>>>>>>>>> at scala.util.Try.orElse(Try.scala:84)
>>>>>>>>>>>> at
>>>>>>>>>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:62)
>>>>>>>>>>>> ... 4 more
>>>>>>>>>>>>
>>>>>>>>>>>> Code
>>>>>>>>>>>>         SQLContext sqlContext = new SQLContext(sc);
>>>>>>>>>>>>         DataFrame df = sqlContext.read()
>>>>>>>>>>>>             .format("org.apache.spark.xml")
>>>>>>>>>>>>             .option("rowTag", "row")
>>>>>>>>>>>>             .load("A.xml");
>>>>>>>>>>>>
>>>>>>>>>>>> Any suggestions please ..
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Jun 17, 2016 at 2:42 PM, Marco Mistroni
<
>>>>>>>>>>>> mmistroni@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> too little info
>>>>>>>>>>>>> it'll help if you can post the exception
and show your sbt
>>>>>>>>>>>>> file (if you are using sbt), and provide
minimal details on what you are
>>>>>>>>>>>>> doing
>>>>>>>>>>>>> kr
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Jun 17, 2016 at 10:08 AM, VG
<vlinked@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Failed to find data source: com.databricks.spark.xml
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Any suggestions to resolve this
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message