lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Rowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-5983) Received an "java.lang.AssertionError: Attempting to read past the end of a segment."
Date Tue, 15 Apr 2014 00:03:20 GMT

    [ https://issues.apache.org/jira/browse/SOLR-5983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969036#comment-13969036
] 

Steve Rowe commented on SOLR-5983:
----------------------------------

Dan,

Strings of this form (from the {{description_html}} field) trigger the exception:

{noformat}
<! [CDATA[Ultraflexible Series Cable] ] >
{noformat}

The above string alone hits the assert.  The characters between {{<!}} and {{[CDATA[}},
{{\]}} and {{\]}}, and {{\]}} and {{>}} are all U+2009 THIN SPACE.

I'm working on tracking down why - looks like it's related to the U+2009 char in front of
{{[CDATA[}}. 

By the way, if you're inserting the U+2009 intentionally to block recognition of CDATA sections
and force HTML stripping, an alternate technique is to run text through {{HTMLStripCharFilter}}
twice.

> Received an "java.lang.AssertionError: Attempting to read past the end of a segment."
> -------------------------------------------------------------------------------------
>
>                 Key: SOLR-5983
>                 URL: https://issues.apache.org/jira/browse/SOLR-5983
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 4.7.1
>         Environment: Rhat - running in AWS Large Instance (4processors, 16gb ram) working
in attached storage.
>            Reporter: Dan
>         Attachments: temp.txt
>
>
> I'm hammering on this Solr Instance.  I've got three cores that I'm using to store millions
of small bits of reference data.  I'm using a heavily tweaked Tika to parse xml files and
ingest them into Solr, while referencing this data.  So I'm making hundreds of query requests
against solr, while also making some substantial posts. (I queue up the posts, in general
sending in 100 documents at a time). 
> Stack Trace:
> 4099640 [qtp39890933-24] WARN  org.eclipse.jetty.servlet.ServletHandler  – Error for
/solr/us_patent_gran
> t/update
> java.lang.AssertionError: Attempting to read past the end of a segment.
>         at org.apache.lucene.analysis.charfilter.HTMLStripCharFilter$TextSegment.nextChar(HTMLStripCharFi
> lter.java:30885)
>         at org.apache.lucene.analysis.charfilter.HTMLStripCharFilter.zzDoEOF(HTMLStripCharFilter.java:311
> 50)
>         at org.apache.lucene.analysis.charfilter.HTMLStripCharFilter.nextChar(HTMLStripCharFilter.java:31
> 802)
>         at org.apache.lucene.analysis.charfilter.HTMLStripCharFilter.read(HTMLStripCharFilter.java:30829)
>         at org.apache.lucene.analysis.charfilter.HTMLStripCharFilter.read(HTMLStripCharFilter.java:30842)
       at org.apache.lucene.analysis.standard.std40.StandardTokenizerImpl40.zzRefill(StandardTokenizerImpl40.java:916)
>         at org.apache.lucene.analysis.standard.std40.StandardTokenizerImpl40.getNextToken(StandardTokenizerImpl40.java:1123)
>         at org.apache.lucene.analysis.standard.StandardTokenizer.incrementToken(StandardTokenizer.java:17
> 5)
>         at org.apache.lucene.analysis.payloads.TokenOffsetPayloadTokenFilter.incrementToken(TokenOffsetPa
> yloadTokenFilter.java:45)
>         at org.apache.lucene.analysis.core.LowerCaseFilter.incrementToken(LowerCaseFilter.java:54)
>         at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:182)
>         at org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:248)
>         at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:253)
>         at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:455)
>         at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1534)
>         at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:236)
>         at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:160)
>         at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:
> 69)
>         at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java
> :51)
>         at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProces
> sor.java:704)
>         at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProces
> sor.java:858)
>         at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProces
> sor.java:557)
>         at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:
> 100)
>         at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:247)
>         at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174)
>         at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
>         at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.ja
> va:74)
>         at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>         at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916)
>         at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:780)
>         at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427)
>         at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217)
>         at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message