spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Holden Karau <hol...@pigscanfly.ca>
Subject Re: pySpark - convert log/txt files into sequenceFile
Date Tue, 28 Oct 2014 16:56:00 GMT
Hi Csaba,

It sounds like the API you are looking for is sc.wholeTextFiles :)

Cheers,

Holden :)

On Tuesday, October 28, 2014, Csaba Ragany <ragesz@gmail.com> wrote:

> Dear Spark Community,
>
> Is it possible to convert text files (.log or .txt files) into
> sequencefiles in Python?
>
> Using PySpark I can create a parallelized file with
> rdd=sc.parallelize([('key1', 1.0)]) and I can save it as a sequencefile
> with rdd.saveAsSequenceFile(). But how can I put the whole content of my
> text files into the 'value' of 'key1' ?
>
> I want a sequencefile where the keys are the filenames of the text files
> and the values are their content.
>
> Thank you for any help!
> Csaba
>


-- 
Cell : 425-233-8271

Mime
View raw message