spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Նարեկ Գալստեան <>
Subject Re: get directory names that are affected by sc.textFile("path/to/dir/*/*/*.js")
Date Tue, 27 Oct 2015 15:47:10 GMT
well I do not really need to do it while another job is editing them.
I just need to get the names of the folders when I read through

Using *native hadoop* libraries, can I do something like*

Narek Galstyan

Նարեկ Գալստյան

On 27 October 2015 at 19:13, Deenar Toraskar <>

> This won't work as you can never guarantee which files were read by Spark
> if some other process is writing files to the same location. It would be
> far less work to move files matching your pattern to a staging location and
> then load them using sc.textFile. you should find hdfs file system calls
> that are equivalent to normal file system if command line tools like distcp
> or mv don't meet your needs.
> On 27 Oct 2015 1:49 p.m., "Նարեկ Գալստեան" <>
>> Dear Spark users,
>> I am reading a set of json files to compile them to Parquet data format.
>> I am willing to mark the folders in some way after having read their
>> contents so that I do not read it again(e.g. I can changed the name of the
>> folder).
>> I use .textFile("path/to*/dir/*/*/*.js") *technique to* automatically
>> *detect
>> the files.
>> I cannot however, use the same notation* to rename them.*
>> Could you suggest how I can *get the names of these folders* so that I can
>> rename them using native hadoop libraries.
>> I am using Apache Spark 1.4.1
>> I look forward to hearing suggestions!!
>> yours,
>> Narek
>> Նարեկ Գալստյան

View raw message