spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chirag Dewan <chirag.de...@ericsson.com>
Subject RE: Output files of saveAsText are getting stuck in temporary directory
Date Tue, 08 Sep 2015 03:20:01 GMT
Hi,

Any idea about this? I am still facing this issue.

thanks,

Chirag

-----Original Message-----
From: Chirag Dewan [mailto:chirag.dewan@ericsson.com] 
Sent: Friday, September 04, 2015 3:26 PM
To: Sean Owen
Cc: user@spark.apache.org
Subject: RE: Output files of saveAsText are getting stuck in temporary directory

Yes. The driver has successfully stopped. All the shutdown is succeeded without any errors
in logs. 

I am using spark 1.4.1 with Cassandra 2.0.14.

Chirag

-----Original Message-----
From: Sean Owen [mailto:sowen@cloudera.com]
Sent: Friday, September 04, 2015 3:23 PM
To: Chirag Dewan
Cc: user@spark.apache.org
Subject: Re: Output files of saveAsText are getting stuck in temporary directory

That means the save has not finished yet. Are you sure it did? it writes in _temporary while
it's in progress

On Fri, Sep 4, 2015 at 10:10 AM, Chirag Dewan <chirag.dewan@ericsson.com> wrote:
> Hi,
>
>
>
> I have a 2 node Spark cluster and I am trying to read data from a 
> Cassandra cluster and save the data as CSV file. Here is my code:
>
>
>
> JavaRDD<String> mapPair = cachedRdd.map(new Function<CassandraRow,
> String>() {
>
>
>
>                                                 /**
>
>                                                 *
>
>                                                  */
>
>                                                 private static final 
> long serialVersionUID = 1L;
>
>
>
>                                                 @Override
>
>                                                 public String 
> call(CassandraRow v1) throws Exception {
>
>
>
>
> StringBuilder sb = new StringBuilder();
>
>
> sb.append(v1.getString(0));
>
>
> sb.append(",");
>
>
> sb.append(v1.getBytes(1));
>
>
> sb.append(",");
>
>
> sb.append(v1.getString(2));
>
>
> sb.append(",");
>
>
> sb.append(v1.getString(3));
>
>
> sb.append(",");
>
>
> sb.append(v1.getString(4));
>
>
> sb.append(",");
>
>
> sb.append(v1.getString(5));
>
>                                                                 return 
> sb.toString();
>
>                                                 }
>
>                                 });
>
>
>
> JavaRDD<String> cachedRdd1 = mapPair.cache();
>
>
>
>                 JavaRDD<String> coalescedRdd = cachedRdd1.coalesce(1);
>
>
> coalescedRdd.saveAsTextFile("file:///home/echidew/cassandra/test-100.t
> xt");
>
>
>
>                 context.stop();
>
>
>
> The problem is that part-00000 file is created with all the records in 
> the _temporary/task-UUID folder. As I have read and understood this 
> file should be stored at my output path and the temporary directory is 
> deleted. Anything I need to change in my code or environment? What 
> could be the reason for that?
>
>
>
> Any help appreciated.
>
>
>
> P.S : Posting only the relevant code. Sorry for the formatting.
>
>
>
> Thanks,
>
>
>
> Chirag

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Mime
View raw message