lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Burton-West, Tom" <tburt...@umich.edu>
Subject RE: ArrayIndexOutOfBoundsException with facet query
Date Mon, 11 Apr 2011 16:51:17 GMT
Thanks Mike,

At first I thought this couldn't be related to the 2.1 Billion terms issue since the only
place we have tons of terms is in the OCR field and this is not the OCR field. But then I
remembered that the total number of terms in all fields is what matters. We've had no problems
with regular searches against the index or with other facet queries.  Only with this facet.
  Is TermInfoAndOrd only used for faceting?

I'll go ahead and build the patch and let you know.


Tom

p.s. Here is the field definition:
<field name="topicStr" type="string" indexed="true" stored="false" multiValued="true"/>
<fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>


-----Original Message-----
From: Michael McCandless [mailto:lucene@mikemccandless.com] 
Sent: Monday, April 11, 2011 8:40 AM
To: solr-user@lucene.apache.org
Cc: Burton-West, Tom
Subject: Re: ArrayIndexOutOfBoundsException with facet query

Tom,

I think I see where this may be -- it looks like another > 2B terms
bug in Lucene (we are using an int instead of a long in the
TermInfoAndOrd class inside TermInfosReader.java), only present in
3.1.

I'm also mad that Test2BTerms fails to catch this!!  I will go fix
that test and confirm it sees this bug.

Can you build from source?  If so, try this patch:

Index: lucene/src/java/org/apache/lucene/index/TermInfosReader.java
===================================================================
--- lucene/src/java/org/apache/lucene/index/TermInfosReader.java	(revision
1089906)
+++ lucene/src/java/org/apache/lucene/index/TermInfosReader.java	(working copy)
@@ -46,8 +46,8 @@

   // Just adds term's ord to TermInfo
   private final static class TermInfoAndOrd extends TermInfo {
-    final int termOrd;
-    public TermInfoAndOrd(TermInfo ti, int termOrd) {
+    final long termOrd;
+    public TermInfoAndOrd(TermInfo ti, long termOrd) {
       super(ti);
       this.termOrd = termOrd;
     }
@@ -245,7 +245,7 @@
             // wipe out the cache when they iterate over a large numbers
             // of terms in order
             if (tiOrd == null) {
-              termsCache.put(cacheKey, new TermInfoAndOrd(ti, (int)
enumerator.position));
+              termsCache.put(cacheKey, new TermInfoAndOrd(ti,
enumerator.position));
             } else {
               assert sameTermInfo(ti, tiOrd, enumerator);
               assert (int) enumerator.position == tiOrd.termOrd;
@@ -262,7 +262,7 @@
     // random-access: must seek
     final int indexPos;
     if (tiOrd != null) {
-      indexPos = tiOrd.termOrd / totalIndexInterval;
+      indexPos = (int) (tiOrd.termOrd / totalIndexInterval);
     } else {
       // Must do binary search:
       indexPos = getIndexOffset(term);
@@ -274,7 +274,7 @@
     if (enumerator.term() != null && term.compareTo(enumerator.term()) == 0) {
       ti = enumerator.termInfo();
       if (tiOrd == null) {
-        termsCache.put(cacheKey, new TermInfoAndOrd(ti, (int)
enumerator.position));
+        termsCache.put(cacheKey, new TermInfoAndOrd(ti, enumerator.position));
       } else {
         assert sameTermInfo(ti, tiOrd, enumerator);
         assert (int) enumerator.position == tiOrd.termOrd;

Mike

http://blog.mikemccandless.com

On Fri, Apr 8, 2011 at 4:53 PM, Burton-West, Tom <tburtonw@umich.edu> wrote:
> The query below results in an array out of bounds exception:
> select/?q=solr&version=2.2&start=0&rows=0&facet=true&facet.field=topicStr
>
> Here is the exception:
>  Exception during facet.field of topicStr:java.lang.ArrayIndexOutOfBoundsException:
-1931149
>        at org.apache.lucene.index.TermInfosReader.seekEnum(TermInfosReader.java:201)
>
> We are using a dev version of Solr/Lucene:
>
> Solr Specification Version: 3.0.0.2010.11.19.16.00.54
> Solr Implementation Version: 3.1-SNAPSHOT 1036094 - root - 2010-11-19 16:00:54
> Lucene Specification Version: 3.1-SNAPSHOT
> Lucene Implementation Version: 3.1-SNAPSHOT 1036094 - 2010-11-19 16:01:10
>
> Just before the exception we see this entry in our tomcat logs:
>
> Apr 8, 2011 2:01:58 PM org.apache.solr.request.UnInvertedField uninvert
> INFO: UnInverted multi-valued field {field=topicStr,memSize=7675174,tindexSize=289102,time=2577,phase1=2537,nTerms=498975,bigTerms=0,termInstances=1368694,uses=0}
> Apr 8, 2011 2:01:58 PM org.apache.solr.core.SolrCore execute
>
> Is this a known bug?  Can anyone provide a clue as to how we can determine what the
problem is?
>
> Tom Burton-West
>
>
> Appended Below is the exception stack trace:
>
> SEVERE: Exception during facet.field of topicStr:java.lang.ArrayIndexOutOfBoundsException:
-1931149
>        at org.apache.lucene.index.TermInfosReader.seekEnum(TermInfosReader.java:201)
>        at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:271)
>        at org.apache.lucene.index.TermInfosReader.terms(TermInfosReader.java:338)
>        at org.apache.lucene.index.SegmentReader.terms(SegmentReader.java:928)
>        at org.apache.lucene.index.DirectoryReader$MultiTermEnum.<init>(DirectoryReader.java:1055)
>        at org.apache.lucene.index.DirectoryReader.terms(DirectoryReader.java:659)
>        at org.apache.solr.search.SolrIndexReader.terms(SolrIndexReader.java:302)
>        at org.apache.solr.request.NumberedTermEnum.skipTo(UnInvertedField.java:1018)
>        at org.apache.solr.request.UnInvertedField.getTermText(UnInvertedField.java:838)
>        at org.apache.solr.request.UnInvertedField.getCounts(UnInvertedField.java:617)
>        at org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:279)
>        at org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:312)
>        at org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:174)
>        at org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:72)
>        at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
>        at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1354)
>
>

Mime
View raw message