lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Umut Erogul (JIRA)" <>
Subject [jira] [Created] (SOLR-7867) implicit sharded, facet grouping problem with multivalued string field starting with digits
Date Tue, 04 Aug 2015 09:32:04 GMT
Umut Erogul created SOLR-7867:

             Summary: implicit sharded, facet grouping problem with multivalued string field
starting with digits
                 Key: SOLR-7867
             Project: Solr
          Issue Type: Bug
          Components: faceting, SolrCloud
    Affects Versions: 5.2
         Environment: 3.13.0-48-generic #80-Ubuntu SMP x86_64 GNU/Linux
            Reporter: Umut Erogul

related parts @ schema.xml:
<field name="keyword_ss" type="string" indexed="true" stored="true" docValues="true" multiValued="true"/>
<field name="author_s" type="string" indexed="true" stored="true" docValues="true"/>

every document has valid author_s and keyword_ss fields;

we can make successful facet group queries on single node, single collection, solr-4.9.0 server
q: *:* fq: keyword_ss:3m

when querying on solr-5.2.0 server with implicit sharded environment with:
<!-- router.field -->
<field name="shard_name" type="string" indexed="true" stored="true" required="true"/>
with example shard names; affinity1 affinity2 affinity3 affinity4

the same query with same documents gets:
ERROR - 2015-08-04 08:15:15.222; [document affinity3 core_node32 document_affinity3_replica2]
org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: Exception during
facet.field: keyword_ss
        at org.apache.solr.request.SimpleFacets$
        at org.apache.solr.request.SimpleFacets$
        at org.apache.solr.request.SimpleFacets$2.execute(
        at org.apache.solr.request.SimpleFacets.getFacetFieldCounts(
        at org.eclipse.jetty.util.thread.QueuedThreadPool$
Caused by: java.lang.ArrayIndexOutOfBoundsException
        at org.apache.lucene.codecs.lucene50.Lucene50DocValuesProducer$CompressedBinaryDocValues$CompressedBinaryTermsEnum.readTerm(
        at org.apache.lucene.codecs.lucene50.Lucene50DocValuesProducer$CompressedBinaryDocValues$
        at org.apache.solr.request.SimpleFacets.getGroupedCounts(
        at org.apache.solr.request.SimpleFacets.getTermCounts(
        at org.apache.solr.request.SimpleFacets.getTermCounts(
        at org.apache.solr.request.SimpleFacets$
        ... 33 more

all the problematic queries are caused by strings starting with digits; ("3m", "8 saniye",
"2 broke girls", "1v1y")
there are some strings that the query works like ("24", "90+", "45 dakika")

we do not observe the problem when querying with 

updating the problematic documents (a small subset of keyword_ss:(0-9)*), fixes the query,

but we cannot find an easy solution to find the problematic documents 

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message