lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Timothy Potter <thelabd...@gmail.com>
Subject Re: SolrCloud MatchAllDocsQuery returning different number of docs each request
Date Thu, 02 Aug 2012 18:18:26 GMT
Sorry, I didn't answer your other questions about shards being
in-sync. Yes - all are green and happy according to the Cloud admin
panel.

Tim

On Thu, Aug 2, 2012 at 12:16 PM, Timothy Potter <thelabdude@gmail.com> wrote:
> Thanks Mark.
>
> I'm actually using SolrJ 3.4.0, so using CommonsHttpSolrServer:
>
> Collection<SolrInputDocument> batch = ...
> ... build up batch ...
> solrServer.add( batch );
>
> Basically, I have a custom Pig StoreFunc that sends docs to Solr from
> our Hadoop analytics nodes. The reason I'm not using SolrJ 4.0.0-ALPHA
> is that I couldn't get it to run in my Hadoop environment. There's
> some classpath conflict with the Apache HttpClient. SolrJ 4 depends on
> 4.1.3 but when I run it in my env, I get the following:
>
> Caused by: java.lang.NoSuchMethodError:
> org.apache.http.impl.conn.tsccm.ThreadSafeClientConnManager: method
> <init>()V not found
>         at org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:94)
>         at org.apache.solr.client.solrj.impl.CloudSolrServer.<init>(CloudSolrServer.java:70)
>         ... 16 more
>
> I spent hours trying to resolve the classpath issue and finally had to
> bail and just used the 3.4 SolrJ client as I'm just at the evaluation
> stage at this point. So it sounds like this could be the cause of my
> problems.
>
> One other thing ... I do have the _version_ field defined in my
> schema.xml but am not setting it on the client side when indexing.
> Should I be doing that?
>
> Cheers,
> Tim
>
>
> On Thu, Aug 2, 2012 at 11:27 AM, Mark Miller <markrmiller@gmail.com> wrote:
>>
>> On Aug 2, 2012, at 11:08 AM, Timothy Potter <thelabdude@gmail.com> wrote:
>>
>>> Just starting to get into SolrCloud using 4.0.0-ALPHA and am very
>>> impressed so far ...
>>>
>>> I have a 12-shard index with ~104M docs with each shard having
>>> 1-replica (so 24 Solr servers running)
>>>
>>> Using the Query form on the Admin panel, I issue the MatchAllDocsQuery
>>> (*:*) and each time I send the request the value for numFound in the
>>> result is different. It's always close but not exactly the same as I
>>> would expect? Can anyone shed some light on this issue? I also tried a
>>> real query, such as "#olympics lochte" and same thing - different
>>> numFound each time. The first page of actual docs returned is the same
>>> so maybe I should just ignore the numFound issue?
>>>
>>> Note that while experiencing this behavior, I am not adding any docs
>>> to the index and all docs have been committed with waitFlush=true and
>>> waitSearcher=true on the commit. Also, not doing soft commits at this
>>> point. In addition, after having committed all 104M docs, I hit the
>>> optimize button the panel so I have only 1 segment. In other words,
>>> the index is not being updated and has been optimized at this point.
>>
>>
>> How are you adding docs? Eg what client and what method in particular (what is your
line of code that actually adds the doc).
>>
>> You can find the numFound result for each node by passing the param distrib=false.
What does this tell you? Are your replicas in sync with the leader? What does the count for
each shard add up to?
>>
>> I would not ignore the issue - something must be off. It may somehow be user error,
it may be a bug that has been fixed since the alpha, or it may be something new.
>>
>> Are you sure every shard you are issuing the query *from* is active and live according
to ZooKeeper? Eg when you look at the cloud admin view and look at the cluster visualization,
are all the nodes green?
>>
>> - Mark Miller
>> lucidimagination.com
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>

Mime
View raw message