spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcelo Vanzin <van...@cloudera.com>
Subject Re: Spark to utilize HDFS's mmap caching
Date Tue, 13 May 2014 17:14:31 GMT
On Mon, May 12, 2014 at 12:14 PM, Matei Zaharia <matei.zaharia@gmail.com> wrote:
> That API is something the HDFS administrator uses outside of any application to tell
HDFS to cache certain files or directories. But once you’ve done that, any existing HDFS
client accesses them directly from the cache.

Ah, yeah, sure. What I meant is that Spark itself will not, AFAIK, use
that facility for adding files to the cache or anything like that. But
yes, it does benefit from things already cached.


> On May 12, 2014, at 11:10 AM, Marcelo Vanzin <vanzin@cloudera.com> wrote:
>
>> Is that true? I believe that API Chanwit is talking about requires
>> explicitly asking for files to be cached in HDFS.
>>
>> Spark automatically benefits from the kernel's page cache (i.e. if
>> some block is in the kernel's page cache, it will be read more
>> quickly). But the explicit HDFS cache is a different thing; Spark
>> applications that want to use it would have to explicitly call the
>> respective HDFS APIs.
>>
>> On Sun, May 11, 2014 at 11:04 PM, Matei Zaharia <matei.zaharia@gmail.com> wrote:
>>> Yes, Spark goes through the standard HDFS client and will automatically benefit
from this.
>>>
>>> Matei
>>>
>>> On May 8, 2014, at 4:43 AM, Chanwit Kaewkasi <chanwit@gmail.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> Can Spark (0.9.x) utilize the caching feature in HDFS 2.3 via
>>>> sc.textFile() and other HDFS-related APIs?
>>>>
>>>> http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-hdfs/CentralizedCacheManagement.html
>>>>
>>>> Best regards,
>>>>
>>>> -chanwit
>>>>
>>>> --
>>>> Chanwit Kaewkasi
>>>> linkedin.com/in/chanwit
>>>
>>
>>
>>
>> --
>> Marcelo
>



-- 
Marcelo

Mime
View raw message