lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <>
Subject Re: Solr 4.2.1 limit on number of rows or number of hits per shard?
Date Thu, 25 Jul 2013 18:26:28 GMT
On 7/25/2013 11:39 AM, Tom Burton-West wrote:
> Hello,
> I am running solr 4.2.1 on 3 shards and have about 365 million documents in
> the index total.
> I sent a query asking for 1 million rows at a time,  but I keep getting an
> error claiming that there is an invalid version or data not in javabin
> format (see below)
> If I lower the number of rows requested to 100,000, I have no problems.
> Does Solr have  a limit on number of rows that can be requested or is this
> a bug?

That particular javabin error (expected 2, but 60) usually means that 
the response it got was something other than javabin, typically HTML or XML.

I was going to say that you should hopefully get a more meaningful error 
message from the server log, but it appears that what you included *IS* 
the server log, so I'm really confused.  The error message you're 
getting is typically something you see on the *client* side.

After some testing on my server, I suspect that what's happening here is 
that the initial shard query (the one with fl=uniqueKeyField,score) is 
working, but then when Solr makes the HUGE subsequent requests for the 
actual documents it is interested in, the list is too big to fit in the 
server-side POST buffer, which defaults to 2MB.  Those queries need to 
be big enough to include an "ids" parameter that is a comma-separated 
list of values from your uniqueKey.  In my case, each of those values 
could be 32 characters, so the id list could be up to 33MB for a million 
of them.  Most of them are significantly shorter, so a 32MB buffer would 
be big enough.

Either multipartUploadLimitInKB doesn't work properly, or there may be 
some hard limits built into the servlet container, because I set 
multipartUploadLimitInKB in the requestDispatcher config to 32768 and it 
still didn't work.  I wonder, perhaps there is a client-side POST buffer 
limit as well as the servlet container limit, which comes in to play 
because the Solr server is acting as a client for the distributed requests?


View raw message