spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sun Rui <sunrise_...@163.com>
Subject Re: Tar File: On Spark
Date Thu, 19 May 2016 06:56:55 GMT
1. create a temp dir on HDFS, say “/tmp”
2. write a script to create in the temp dir one file for each tar file. Each file has only
one line:
<absolute path of the tar file>
3. Write a spark application. It is like:
  val rdd = sc.textFile (<HDFS path of the temp dir>)
  rdd.map { line =>
       construct an untar command using the path information in “line” and launches the
command
  }

> On May 19, 2016, at 14:42, ayan guha <guha.ayan@gmail.com> wrote:
> 
> Hi
> 
> I have few tar files in HDFS in a single folder. each file has multiple files in it.

> 
> tar1:
>       - f1.txt
>       - f2.txt
> tar2:
>       - f1.txt
>       - f2.txt
> 
> (each tar file will have exact same number of files, same name)
> 
> I am trying to find a way (spark or pig) to extract them to their own folders. 
> 
> f1
>       - tar1_f1.txt
>       - tar2_f1.txt
> f2:
>        - tar1_f2.txt
>        - tar1_f2.txt
> 
> Any help? 
> 
> 
> 
> -- 
> Best Regards,
> Ayan Guha



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message