lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "308181687" <308181...@qq.com>
Subject Re:答复:mmap confusion in lucene
Date Tue, 15 Jul 2014 05:04:09 GMT
Hi, Zhiiang 
  It seems that the jvm is smart enough to ignore the unused code.  Try the following code:


                RandomAccessFile raf = new RandomAccessFile(new File("/root/xx.txt"), "r");
                FileChannel rafc = raf.getChannel();
                ByteBuffer buff = rafc.map(FileChannel.MapMode.READ_ONLY, 0, rafc.size());
                 int len=buff.limit();
                 byte b = 0;
                 for (int i = 0; i < len; i++){
                        b + = buff.get();
                }



The java process will consume the expected 800M share memory. But if change the line of "
b + = buff.get()" to "b  = buff.get()",  the java process will not consume so much share memory,
i guess that  the jvm is smart enough to directly skip to the the last pos of the bytebuffer
.


Thanks & Best Regards!‍






------------------ Original ------------------
From:  "java-user@lucene.apache.org wan";<wangzhijiang999@aliyun.com>;
Date:  Tue, Jul 15, 2014 10:44 AM
To:  "java-user"<java-user@lucene.apache.org>; 

Subject:  答复:mmap confusion in lucene



Hi Uwe,
        Thank you for always help. 
For my first testing I am clear of it, it is becuase the OS cache the whole file because of
copying data to java heap and it does not free the page, then I see 800M used by cache in
the end.
But for my last two testings, the OS has freed all the previous cached pages, so I see the
cache used only 4M in the end.
 
Maybe I am not very clear of the internal kernel mechanism. As  I understand, the kernel will
swap out the page when the memory resource is limited or the cached page is not used for long
time. The first condition is not satisfied in my testing, because the OS still has 30G memory
available for use. For the second condition, although the bytes are copied to java heap in
first test, but when the program ends to quit, the OS still reserve the cache. In the last
test, the OS released the page even in the running process of program. Would you give me some
further explaination for this? I am very appreciated.
 
Zhiiang Wang


------------------------------------------------------------------发件人:Uwe Schindler
<uwe@thetaphi.de>发送时间:2014年7月14日(星期一) 18:13收件人:java-user
<java-user@lucene.apache.org>; wangzhijiang999 <wangzhijiang999@aliyun.com>主 题:RE:
mmap confusion in luceneThis is very easy to explain:In the first part you copy the whole
memory mapped stuff into a on-heap byte array. You allocate this byte array in total and you
then do a copy (actually this is a standard libc copy call) of the whole file. To do this
copy, the underlying OS will need to swap in the whole file, because it "sees" that you want
to read the whole file anyway (because of the size of they copy operation).The other example
reads the stuff byte by byte in a Java for-loop. The operating system has no idea how to optimize
that, so whenever you cross page boundaries it will swap in another buffer. Because of internal
kernel-page-garbage collection, the pages swapped in are freed much faster. This is OS specific.In
general copying a random access file to java heap with mmap is just the wrong use case. Lucene
never does this! The idea behind mmap is to *not copy* the data and work on the mmapped region
directly (using random access). The OS cache logic will then use statistics about which pages
were actually used and keep them longer in FS cache than those used one time and then no longer
used for very long time.Uwe-----Uwe SchindlerH.-H.-Meier-Allee 63, D-28213 Bremenhttp://www.thetaphi.deeMail:
uwe@thetaphi.de> -----Original Message-----> From: wangzhijiang999 [mailto:wangzhijiang999@aliyun.com]>
Sent: Monday, July 14, 2014 11:58 AM> To: java-user> Subject: mmap confusion in lucene>
> Hi everybody, I found a problem confused me when I tested the mmap> feature in lucene.
I tested to read a file size of 800M by mmap method like> below:> > RandomAccessFile
raf = new RandomAccessFile(new File(path), "r");> FileChannel rafc = raf.getChannel();ByteBuffer
buff => rafc.map(FileChannel.MapMode.READ_ONLY, 0, rafc.size());> int len=buff.limit();
byte[] b = new byte[len]; for (int i = 0; i < len;> i++){ b[i] = buff.get(); }> After
the program finished, the linux cache will be consumed about 800M.> > > RandomAccessFile
raf = new RandomAccessFile(new File(path), "r");> FileChannel rafc = raf.getChannel();ByteBuffer
buff => rafc.map(FileChannel.MapMode.READ_ONLY, 0, rafc.size());> int len=buff.limit();
for (int i = 0; i < len; i++){ Byte b= buff.get(); }> But in this way, the linux cache
will be consumed just 4M.> > > RandomAccessFile raf = new RandomAccessFile(new File(path),
"r");> FileChannel rafc = raf.getChannel();ByteBuffer buff => rafc.map(FileChannel.MapMode.READ_ONLY,
0, rafc.size());> int len=buff.limit(); byte[] b = new byte[len]; for (int i = 0; i <
len;> i++){ b[i] = buff.get();> b[i]=0; }> In this way, the linux cache will be also
consumed 4M.> > The whole content of the file should be read for above three tests,
but for> the last two testings, the linux system only cached 4M .> Would somebody give
me the explaination about this? Thanks in advane.> > Zhijiang Wang> ---------------------------------------------------------------------To
unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.orgFor additional commands, e-mail:
java-user-help@lucene.apache.org
Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message