lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: mmap confusion in lucene
Date Mon, 14 Jul 2014 10:13:43 GMT
This is very easy to explain:

In the first part you copy the whole memory mapped stuff into a on-heap byte array. You allocate
this byte array in total and you then do a copy (actually this is a standard libc copy call)
of the whole file. To do this copy, the underlying OS will need to swap in the whole file,
because it "sees" that you want to read the whole file anyway (because of the size of they
copy operation).

The other example reads the stuff byte by byte in a Java for-loop. The operating system has
no idea how to optimize that, so whenever you cross page boundaries it will swap in another
buffer. Because of internal kernel-page-garbage collection, the pages swapped in are freed
much faster. This is OS specific.

In general copying a random access file to java heap with mmap is just the wrong use case.
Lucene never does this! The idea behind mmap is to *not copy* the data and work on the mmapped
region directly (using random access). The OS cache logic will then use statistics about which
pages were actually used and keep them longer in FS cache than those used one time and then
no longer used for very long time.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: wangzhijiang999 [mailto:wangzhijiang999@aliyun.com]
> Sent: Monday, July 14, 2014 11:58 AM
> To: java-user
> Subject: mmap confusion in lucene
> 
> Hi everybody,         I found a problem confused me when I tested the mmap
> feature in lucene. I tested to read a file size of 800M by mmap method like
> below:
> 
> RandomAccessFile raf = new RandomAccessFile(new File(path), "r");
> FileChannel rafc = raf.getChannel();ByteBuffer buff =
> rafc.map(FileChannel.MapMode.READ_ONLY, 0, rafc.size());
>  int len=buff.limit(); byte[] b = new byte[len];   for (int i = 0; i < len;
> i++){         b[i] = buff.get();  }
> After the program finished, the linux cache will be consumed about 800M.
> 
> 
> RandomAccessFile raf = new RandomAccessFile(new File(path), "r");
> FileChannel rafc = raf.getChannel();ByteBuffer buff =
> rafc.map(FileChannel.MapMode.READ_ONLY, 0, rafc.size());
>  int len=buff.limit(); for (int i = 0; i < len; i++){         Byte b= buff.get();
 }
> But in this way, the linux cache will be consumed just 4M.
> 
> 
> RandomAccessFile raf = new RandomAccessFile(new File(path), "r");
> FileChannel rafc = raf.getChannel();ByteBuffer buff =
> rafc.map(FileChannel.MapMode.READ_ONLY, 0, rafc.size());
>  int len=buff.limit(); byte[] b = new byte[len];   for (int i = 0; i < len;
> i++){         b[i] = buff.get();
>          b[i]=0;  }
> In this way, the linux cache will  be also consumed 4M.
> 
> The whole content of the file should be read for above three tests, but for
> the last two testings, the linux system only cached 4M .
> Would somebody give me the explaination about this? Thanks in advane.
> 
> Zhijiang Wang
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message