lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "wangzhijiang999" <wangzhijiang...@aliyun.com>
Subject 答复:答复:答复:mmap confusion in lucene
Date Wed, 16 Jul 2014 09:38:47 GMT
Hi Uwe,
        Where can find the detail introduction about the algorithm of mmap in java
and OS? I did not find anything useful from jdk source code. 
 
For example: byte b=curBuf.get(); System.out.printf(b);
 
When running get method, the jvm will not invoke the FS to read file in disk. When running printf
method, that means the data will be used, then the jvm will really invoke the FS to read
data.  My understanding is right?
Thank you!
 
Zhijiang Wang
 
 


------------------------------------------------------------------发件人:Uwe Schindler
<uwe@thetaphi.de>发送时间:2014年7月15日(星期二) 17:29收件人:java-user
<java-user@lucene.apache.org>; wangzhijiang999 <wangzhijiang999@aliyun.com>主 题:RE:
答复:答复:mmap confusion in luceneYes, the JVM is removing the get() call, because
it knows that it has no side-effect: the position() pointer is not used afterwards and the
result of the get() call is also not used. It is partly mapped because the optimization only
starts to kick in after 10,000 method calls (the default threshold in the JVM).Uwe-----Uwe
SchindlerH.-H.-Meier-Allee 63, D-28213 Bremenhttp://www.thetaphi.deeMail: uwe@thetaphi.de>
-----Original Message-----> From: wangzhijiang999 [mailto:wangzhijiang999@aliyun.com]>
Sent: Tuesday, July 15, 2014 11:10 AM> To: java-user> Subject: 答复:答复:mmap
confusion in lucene> > Hi 308181687,> > I also tested in this way. If print every
byte, the OS cache will consume> the size of file at last,about 800M.> > for (int
j = 0; j < len; ++j){ System.out.println(buff.get());}> > > If just call buff.get()
in loop, the OS cache will consume only 8M at last.> for (int j = 0; j < len; ++j){
byte b=buff.get();}> > The buff.get() means reading the byte at this buffer's current
position, and> then increments the position. But actually if you do not use the value from>
buff.get(), FS will not read the disk. And I monitored the disk read and cache> condition
by dstat -md command to confirm that the disk read will not> increase for the second test.>
As you said, the jvm is so smart that if you do not use the data , it will not> read from
disk. As my previous understanding, as long as you use get> method to fetch data, it should
read from disk no matter whether> you actually use the data or not. I will continue researching
on it to find the> real reason.> > > > > ------------------------------------------------------------------发件人:308181687>
<308181687@qq.com>发送时间:2014年7月15日(星期二) 13:04收件人> :java-user
<java-user@lucene.apache.org>主 题:Re:答复:mmap> confusion in luceneHi,
Zhiiang It seems that the jvm is smart enough to> ignore the unused code. Try the following
code:RandomAccessFile raf = new> RandomAccessFile(new File("/root/xx.txt"), "r");FileChannel
rafc => raf.getChannel();ByteBuffer buff => rafc.map(FileChannel.MapMode.READ_ONLY,
0, rafc.size());int> len=buff.limit();byte b = 0;for (int i = 0; i < len; i++){b + =
buff.get();}The java> process will consume the expected 800M share memory. But if change
the> line of " b + = buff.get()" to "b = buff.get()", the java process will not> consume
so much share memory, i guess that the jvm is smart enough to> directly skip to the the
last pos of the bytebuffer .Thanks & Best Regards!‍-----> ------------- Original
------------------From: "java-user@lucene.apache.org> wan";<wangzhijiang999@aliyun.com>;Date:
Tue, Jul 15, 2014 10:44 AMTo:> "java-user"<java-user@lucene.apache.org>; Subject:
答复:mmap> confusion in luceneHi Uwe,Thank you for always help. For my first testing
I> am clear of it, it is becuase the OS cache the whole file because of copying> data
to java heap and it does not free the page, then I see 800M used by> cache in the end.But
for my last two testings, the OS has freed all the> previous cached pages, so I see the
cache used only 4M in the end.Maybe I> am not very clear of the internal kernel mechanism.
As I understand, the> kernel will swap out the page when the memory resource is limited
or the> cached page is not used for long time. The first condition is not satisfied in
my> testing, because the OS still has 30G memory available for use. For the> second
condition, although the bytes are copied to java heap in first test, but> when the program
ends to quit, the OS still reserve the cache. In the last> test, the OS released the page
even in the running process of program.> Would you give me some further explaination for
this? I am very> appreciated.Zhiiang Wang-------------------------------------------------------------->
----发件人:Uwe Schindler <uwe@thetaphi.de>发送时间:2014年7月14> 日(星期一)
18:13收件人:java-user <java-user@lucene.apache.org>;> wangzhijiang999 <wangzhijiang999@aliyun.com>主 题:RE:
mmap> confusion in luceneThis is very easy to explain:In the first part you copy the>
whole memory mapped stuff into a on-heap byte array. You allocate this> byte array in total
and you then do a copy (actually this is a standard libc copy> call) of the whole file.
To do this copy, the underlying OS will need to swap in> the whole file, because it "sees"
that you want to read the whole file anyway> (because of the size of they copy operation).The
other example reads the> stuff byte by byte in a Java for-loop. The operating system has
no idea how> to optimize that, so whenever you cross page boundaries it will swap in>
another buffer. Because of internal kernel-page-garbage collection, the> pages swapped
in are freed much faster. This is OS specific.In general> copying a random access file
to java heap with mmap is just the wrong use> case. Lucene never does this! The idea behind
mmap is to *not copy* the> data and work on the mmapped region directly (using random access).
The> OS cache logic will then use statistics about which pages were actually used> and
keep them longer in FS cache than those used one time and then no> longer used for very
long time.Uwe-----Uwe SchindlerH.-H.-Meier-Allee 63,> D-28213 Bremenhttp://www.thetaphi.deeMail:
uwe@thetaphi.de> -----> Original Message-----> From: wangzhijiang999> [mailto:wangzhijiang999@aliyun.com]>
Sent: Monday, July 14, 2014 11:58> AM> To: java-user> Subject: mmap confusion in
lucene> > Hi everybody, I> found a problem confused me when I tested the mmap>
feature in lucene. I> tested to read a file size of 800M by mmap method like> below:>
>> RandomAccessFile raf = new RandomAccessFile(new File(path), "r");>> FileChannel
rafc = raf.getChannel();ByteBuffer buff =>> rafc.map(FileChannel.MapMode.READ_ONLY,
0, rafc.size());> int> len=buff.limit(); byte[] b = new byte[len]; for (int i = 0; i
< len;> i++){ b[i] => buff.get(); }> After the program finished, the linux cache
will be consumed> about 800M.> > > RandomAccessFile raf = new RandomAccessFile(new>
File(path), "r");> FileChannel rafc = raf.getChannel();ByteBuffer buff =>> rafc.map(FileChannel.MapMode.READ_ONLY,
0, rafc.size());> int> len=buff.limit(); for (int i = 0; i < len; i++){ Byte b= buff.get();
}> But in this> way, the linux cache will be consumed just 4M.> > > RandomAccessFile
raf => new RandomAccessFile(new File(path), "r");> FileChannel rafc => raf.getChannel();ByteBuffer
buff =>> rafc.map(FileChannel.MapMode.READ_ONLY, 0, rafc.size());> int> len=buff.limit();
byte[] b = new byte[len]; for (int i = 0; i < len;> i++){ b[i] => buff.get();>
b[i]=0; }> In this way, the linux cache will be also consumed 4M.>> > The whole
content of the file should be read for above three tests, but> for> the last two testings,
the linux system only cached 4M .> Would> somebody give me the explaination about this?
Thanks in advane.> > Zhijiang> Wang> ---------------------------------------------------------------------To>
unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.orgFor> additional commands, e-mail:
java-user-help@lucene.apache.org
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message