nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@nutch.org>
Subject Re: Event queues vs threads
Date Thu, 01 Sep 2005 18:26:22 GMT
Kelvin Tan wrote:
> Interesting. I haven't tried it myself. Do you have any code/benchmarks for this?

I never committed it anywhere.  I initially tried to write Nutch's IPC 
mechanism with nio and it was slow and buggy.  One problem was that I 
needed to switch streams to non-blocking mode in order to read 
arbitrarily large objects, then switch them back to blocking mode in 
order to select() on them.  But you can't change this state and remove 
them from the selector without going through the scheduler.  So the 
benefit of skipping the scheduler wasn't there.  If I was willing to 
fragment objects into fixed size chunks then it might have worked, but 
that's a lot of work.  It's a strange limitation, since with native 
sockets one can select and then perform arbitrary stream i/o, not 
limited to a single buffer.

Also, there's an nio version of Lucene's Directory that's a bit slower 
than the non-nio version, but this is not using select() or anything.

> Are you aware of others facing the same problem? 

How much non-blocking nio code do you find in real Java code?  I have 
not seen a lot.

I did find that Sun has implemented a high-performance HTTP client using 
nio.  This is documented at:

http://blogs.sun.com/roller/resources/fp/grizzly.pdf

 From what I can tell the primary benefit is in number of simultaneous 
clients, not in throughput.  Does a crawler require 1000's of 
simultaneous connections?  If so, then it looks like careful use of nio 
could offer some real benefits.

Doug

Mime
View raw message