lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: SolrCloud MatchAllDocsQuery returning different number of docs each request
Date Thu, 02 Aug 2012 19:35:01 GMT
Can you do me a favor and try not using the batch add for a run?

Just do the add one doc at a time. (solrServer.add(doc) rather than solrServer.add(collection))

I just fixed one issue with it this morning on trunk - it may be the cause of this oddity.

I'm also working on some performance issues around that method too (good performance without
starting thousands of threads).

Until I get all that straightened out (hopefully very soon), I think you will have better
luck not using the bulk, collection add method.

On Aug 2, 2012, at 2:16 PM, Timothy Potter <thelabdude@gmail.com> wrote:

> Thanks Mark.
> 
> I'm actually using SolrJ 3.4.0, so using CommonsHttpSolrServer:
> 
> Collection<SolrInputDocument> batch = ...
> ... build up batch ...
> solrServer.add( batch );
> 
> Basically, I have a custom Pig StoreFunc that sends docs to Solr from
> our Hadoop analytics nodes. The reason I'm not using SolrJ 4.0.0-ALPHA
> is that I couldn't get it to run in my Hadoop environment. There's
> some classpath conflict with the Apache HttpClient. SolrJ 4 depends on
> 4.1.3 but when I run it in my env, I get the following:
> 
> Caused by: java.lang.NoSuchMethodError:
> org.apache.http.impl.conn.tsccm.ThreadSafeClientConnManager: method
> <init>()V not found
> 	at org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:94)
> 	at org.apache.solr.client.solrj.impl.CloudSolrServer.<init>(CloudSolrServer.java:70)
> 	... 16 more
> 
> I spent hours trying to resolve the classpath issue and finally had to
> bail and just used the 3.4 SolrJ client as I'm just at the evaluation
> stage at this point. So it sounds like this could be the cause of my
> problems.
> 
> One other thing ... I do have the _version_ field defined in my
> schema.xml but am not setting it on the client side when indexing.
> Should I be doing that?
> 
> Cheers,
> Tim
> 
> 
> On Thu, Aug 2, 2012 at 11:27 AM, Mark Miller <markrmiller@gmail.com> wrote:
>> 
>> On Aug 2, 2012, at 11:08 AM, Timothy Potter <thelabdude@gmail.com> wrote:
>> 
>>> Just starting to get into SolrCloud using 4.0.0-ALPHA and am very
>>> impressed so far ...
>>> 
>>> I have a 12-shard index with ~104M docs with each shard having
>>> 1-replica (so 24 Solr servers running)
>>> 
>>> Using the Query form on the Admin panel, I issue the MatchAllDocsQuery
>>> (*:*) and each time I send the request the value for numFound in the
>>> result is different. It's always close but not exactly the same as I
>>> would expect? Can anyone shed some light on this issue? I also tried a
>>> real query, such as "#olympics lochte" and same thing - different
>>> numFound each time. The first page of actual docs returned is the same
>>> so maybe I should just ignore the numFound issue?
>>> 
>>> Note that while experiencing this behavior, I am not adding any docs
>>> to the index and all docs have been committed with waitFlush=true and
>>> waitSearcher=true on the commit. Also, not doing soft commits at this
>>> point. In addition, after having committed all 104M docs, I hit the
>>> optimize button the panel so I have only 1 segment. In other words,
>>> the index is not being updated and has been optimized at this point.
>> 
>> 
>> How are you adding docs? Eg what client and what method in particular (what is your
line of code that actually adds the doc).
>> 
>> You can find the numFound result for each node by passing the param distrib=false.
What does this tell you? Are your replicas in sync with the leader? What does the count for
each shard add up to?
>> 
>> I would not ignore the issue - something must be off. It may somehow be user error,
it may be a bug that has been fixed since the alpha, or it may be something new.
>> 
>> Are you sure every shard you are issuing the query *from* is active and live according
to ZooKeeper? Eg when you look at the cloud admin view and look at the cluster visualization,
are all the nodes green?
>> 
>> - Mark Miller
>> lucidimagination.com
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 

- Mark Miller
lucidimagination.com












Mime
View raw message