spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shay Seng <s...@1618labs.com>
Subject Re: Save RDDs as CSV
Date Thu, 31 Oct 2013 03:05:59 GMT
Well that almost works... when I call
myrdd.saveAsTextFile("hdfs://..../my.csv")

Instead of getting a single my.csv file, as I expect, my.csv is a directory
with a bunch parts - all of which are csv.
Is there some way have those files concatenated automatically?




On Wed, Oct 30, 2013 at 7:13 PM, Josh Rosen <rosenville@gmail.com> wrote:

> saveAsTextFile() is implemented in terms of Hadoop's TextOutputFormat,
> which writes one record per line:
> https://github.com/apache/incubator-spark/blob/v0.8.0-incubating/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L816
>
> You could map() each entry in your RDD into a comma-separated string, then
> write those strings using saveAsTextFile().
>
>
>
>
> On Wed, Oct 30, 2013 at 7:10 PM, Andre Schumacher <
> schumach@icsi.berkeley.edu> wrote:
>
>>
>> Hi,
>>
>> Can you use saveAsTextFile? See
>>
>>
>> http://spark.incubator.apache.org/docs/latest/api/core/index.html#org.apache.spark.rdd.RDD
>>
>> I'm not sure what the default field separator is (Tab probably) but if
>> you don't mind that may work? No need to collect it to the master.
>>
>> Andre
>>
>> On 10/30/2013 06:34 PM, Shay Seng wrote:
>> > What's the recommended way to save a RDD as a CSV on say HDFS?
>> > Do I have to collect the RDD and save it from the master, or is there
>> > someway I can write out the CSV file in parallel to HDFS?
>> >
>> >
>> > tks
>> > shay
>> >
>>
>>
>

Mime
View raw message