lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "sangraal aiken" <>
Subject Re: Doc add limit
Date Mon, 31 Jul 2006 20:22:05 GMT
Chris, my response is below each of your paragraphs...

I don't have the means to try out this code right now ... but i can't see
> any obvious problems with it (there may be somewhere that you are opening
> a stream or reader and not closing it, but i didn't see one) ... i notice
> you are running this client on the same machine as Solr (hence the
> localhost URLs) did you by any chance try running the client on a seperate
> machine to see if hte number of updates before it hangs changes?

When I run the client locally and the Solr server on a slower and separate
development box, the maximum number of updates drops to 3,219. So it's
almost as if it's related to some sort of timeout problem because the
maximum number of updates drops considerably on a slower machine, but it's
weird how consistent the number is. 6,144 locally, 5,000 something when I
run it on the external server, and 3,219 when the client is separate from
the server.

my money is still on a filehandle resource limit somwhere ... if you are
> running on a system that has "lsof" (on some Unix/Linux installations you
> need sudo/su root permissions to run it) you can use "lsof -p ####" to
> look up what files/network connections are open for a given process.  You
> can try running that on both the client pid and the Solr server pid once
> it's hung -- You'll probably see a lot of Jar files in use for both, but
> if you see more then a few XML files open by the client, or more then a
> 1 TCP connection open by either the client or the server, there's your
> culprit.

The only output I get from 'lsof -p' that pertains to TCP connections are
the following...I'm not too sure how to interpret it though:
java    4104 sangraal  261u  IPv6 0x5b060f0       0t0      TCP *:8009
java    4104 sangraal  262u  IPv6 0x55d59e8       0t0      TCP
java    4104 sangraal  263u  IPv6 0x53cc0e0       0t0      TCP [::
]:http-alt->[::]:51039 (ESTABLISHED)
java    4104 sangraal  264u  IPv6 0x5b059d0       0t0      TCP [::
]:51045->[::]:http-alt (ESTABLISHED)
java    4104 sangraal  265u  IPv6 0x53cc9c8       0t0      TCP [::
]:http-alt->[::]:51045 (ESTABLISHED)
java    4104 sangraal   11u  IPv6 0x5b04f20       0t0      TCP *:http-alt
java    4104 sangraal   12u  IPv6 0x5b06d68       0t0      TCP
localhost:51037->localhost:51036 (TIME_WAIT)

I'm not sure what Windows equivilent of lsof may exist.
> Wait ... i just had another thought....
> You are using InputStreamReader to deal with the InputStreams of your
> remote XML files -- but you aren't specifying a charset, so it's using
> your system default which may be differnet from the charset of the
> orriginal XML files you are pulling from the URL -- which (i *think*)
> means that your InputStreamReader may in some cases fail to read all of
> the bytes of the stream, which might some dangling filehandles (i'm just
> guessing on that part ... i'm not acctually sure whta happens in that
> case).
> What if you simplify your code (for the purposes of testing) and just put
> the post-transform version ganja-full.xml in a big ass String variable in
> your java app and just call GanjaUpdate.doUpdate(bigAssString) over and
> over again ... does that cause the same problem?

In the code, I read the XML with a StringReader and then pass it to
GanjaUpdate as a string anyway.  I've output the String object and verified
that it is in fact all there.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message