spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Նարեկ Գալստեան <ngalsty...@gmail.com>
Subject Re: get directory names that are affected by sc.textFile("path/to/dir/*/*/*.js")
Date Tue, 27 Oct 2015 15:47:10 GMT
well I do not really need to do it while another job is editing them.
I just need to get the names of the folders when I read through
textFile("path/to/dir/*/*/*.js")

Using *native hadoop* libraries, can I do something like*
fs.copy("/my/path/*/*","new/path/")?*



Narek Galstyan

Նարեկ Գալստյան

On 27 October 2015 at 19:13, Deenar Toraskar <deenar.toraskar@gmail.com>
wrote:

> This won't work as you can never guarantee which files were read by Spark
> if some other process is writing files to the same location. It would be
> far less work to move files matching your pattern to a staging location and
> then load them using sc.textFile. you should find hdfs file system calls
> that are equivalent to normal file system if command line tools like distcp
> or mv don't meet your needs.
> On 27 Oct 2015 1:49 p.m., "Նարեկ Գալստեան" <ngalstyan4@gmail.com>
wrote:
>
>> Dear Spark users,
>>
>> I am reading a set of json files to compile them to Parquet data format.
>> I am willing to mark the folders in some way after having read their
>> contents so that I do not read it again(e.g. I can changed the name of the
>> folder).
>>
>> I use .textFile("path/to*/dir/*/*/*.js") *technique to* automatically
>> *detect
>> the files.
>> I cannot however, use the same notation* to rename them.*
>>
>> Could you suggest how I can *get the names of these folders* so that I can
>> rename them using native hadoop libraries.
>>
>> I am using Apache Spark 1.4.1
>>
>> I look forward to hearing suggestions!!
>>
>> yours,
>>
>> Narek
>>
>> Նարեկ Գալստյան
>>
>

Mime
View raw message