spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nathan Kronenfeld <nkronenf...@oculusinfo.com>
Subject Re: reading large XML files
Date Tue, 20 May 2014 19:47:10 GMT
Thanks, that sounds perfect



On Tue, May 20, 2014 at 1:38 PM, Xiangrui Meng <mengxr@gmail.com> wrote:

> You can search for XMLInputFormat on Google. There are some
> implementations that allow you to specify the <tag> to split on, e.g.:
>
> https://github.com/lintool/Cloud9/blob/master/src/dist/edu/umd/cloud9/collection/XMLInputFormat.java
>
> On Tue, May 20, 2014 at 10:31 AM, Nathan Kronenfeld
> <nkronenfeld@oculusinfo.com> wrote:
> > Unfortunately, I don't have a bunch of moderately big xml files; I have
> one,
> > really big file - big enough that reading it into memory as a single
> string
> > is not feasible.
> >
> >
> > On Tue, May 20, 2014 at 1:24 PM, Xiangrui Meng <mengxr@gmail.com> wrote:
> >>
> >> Try sc.wholeTextFiles(). It reads the entire file into a string
> >> record. -Xiangrui
> >>
> >> On Tue, May 20, 2014 at 8:25 AM, Nathan Kronenfeld
> >> <nkronenfeld@oculusinfo.com> wrote:
> >> > We are trying to read some large GraphML files to use in spark.
> >> >
> >> > Is there an easy way to read XML-based files like this that accounts
> for
> >> > partition boundaries and the like?
> >> >
> >> >              Thanks,
> >> >              Nathan
> >> >
> >> >
> >> > --
> >> > Nathan Kronenfeld
> >> > Senior Visualization Developer
> >> > Oculus Info Inc
> >> > 2 Berkeley Street, Suite 600,
> >> > Toronto, Ontario M5A 4J5
> >> > Phone:  +1-416-203-3003 x 238
> >> > Email:  nkronenfeld@oculusinfo.com
> >
> >
> >
> >
> > --
> > Nathan Kronenfeld
> > Senior Visualization Developer
> > Oculus Info Inc
> > 2 Berkeley Street, Suite 600,
> > Toronto, Ontario M5A 4J5
> > Phone:  +1-416-203-3003 x 238
> > Email:  nkronenfeld@oculusinfo.com
>



-- 
Nathan Kronenfeld
Senior Visualization Developer
Oculus Info Inc
2 Berkeley Street, Suite 600,
Toronto, Ontario M5A 4J5
Phone:  +1-416-203-3003 x 238
Email:  nkronenfeld@oculusinfo.com

Mime
View raw message