lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Timothy Potter <thelabd...@gmail.com>
Subject Re: SolrCloud MatchAllDocsQuery returning different number of docs each request
Date Thu, 02 Aug 2012 18:16:00 GMT
Thanks Mark.

I'm actually using SolrJ 3.4.0, so using CommonsHttpSolrServer:

Collection<SolrInputDocument> batch = ...
... build up batch ...
solrServer.add( batch );

Basically, I have a custom Pig StoreFunc that sends docs to Solr from
our Hadoop analytics nodes. The reason I'm not using SolrJ 4.0.0-ALPHA
is that I couldn't get it to run in my Hadoop environment. There's
some classpath conflict with the Apache HttpClient. SolrJ 4 depends on
4.1.3 but when I run it in my env, I get the following:

Caused by: java.lang.NoSuchMethodError:
org.apache.http.impl.conn.tsccm.ThreadSafeClientConnManager: method
<init>()V not found
	at org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:94)
	at org.apache.solr.client.solrj.impl.CloudSolrServer.<init>(CloudSolrServer.java:70)
	... 16 more

I spent hours trying to resolve the classpath issue and finally had to
bail and just used the 3.4 SolrJ client as I'm just at the evaluation
stage at this point. So it sounds like this could be the cause of my
problems.

One other thing ... I do have the _version_ field defined in my
schema.xml but am not setting it on the client side when indexing.
Should I be doing that?

Cheers,
Tim


On Thu, Aug 2, 2012 at 11:27 AM, Mark Miller <markrmiller@gmail.com> wrote:
>
> On Aug 2, 2012, at 11:08 AM, Timothy Potter <thelabdude@gmail.com> wrote:
>
>> Just starting to get into SolrCloud using 4.0.0-ALPHA and am very
>> impressed so far ...
>>
>> I have a 12-shard index with ~104M docs with each shard having
>> 1-replica (so 24 Solr servers running)
>>
>> Using the Query form on the Admin panel, I issue the MatchAllDocsQuery
>> (*:*) and each time I send the request the value for numFound in the
>> result is different. It's always close but not exactly the same as I
>> would expect? Can anyone shed some light on this issue? I also tried a
>> real query, such as "#olympics lochte" and same thing - different
>> numFound each time. The first page of actual docs returned is the same
>> so maybe I should just ignore the numFound issue?
>>
>> Note that while experiencing this behavior, I am not adding any docs
>> to the index and all docs have been committed with waitFlush=true and
>> waitSearcher=true on the commit. Also, not doing soft commits at this
>> point. In addition, after having committed all 104M docs, I hit the
>> optimize button the panel so I have only 1 segment. In other words,
>> the index is not being updated and has been optimized at this point.
>
>
> How are you adding docs? Eg what client and what method in particular (what is your line
of code that actually adds the doc).
>
> You can find the numFound result for each node by passing the param distrib=false. What
does this tell you? Are your replicas in sync with the leader? What does the count for each
shard add up to?
>
> I would not ignore the issue - something must be off. It may somehow be user error, it
may be a bug that has been fixed since the alpha, or it may be something new.
>
> Are you sure every shard you are issuing the query *from* is active and live according
to ZooKeeper? Eg when you look at the cloud admin view and look at the cluster visualization,
are all the nodes green?
>
> - Mark Miller
> lucidimagination.com
>
>
>
>
>
>
>
>
>
>
>

Mime
View raw message