lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Rowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-5983) Received an "java.lang.AssertionError: Attempting to read past the end of a segment."
Date Tue, 15 Apr 2014 00:35:23 GMT

    [ https://issues.apache.org/jira/browse/SOLR-5983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969073#comment-13969073
] 

Steve Rowe commented on SOLR-5983:
----------------------------------

Looks like there are two problems: 

# Any chars between {{<!}} and {{[CDATA[}} should block recognition of a CDATA section,
but those chars are now passed through to the output, and a CDATA section is improperly recognized.
# The immediate cause of the assert is an unclosed CDATA section. {{HTMLStripCharFilter}}
requires the exact string {{]]>}} to close out a CDATA section, following the XML spec.
 When a CDATA section is started (even improperly, as in the first problem above), but the
CDATA closing string is not found, the assert is hit at end-of-input.  So this is the minimal
error-triggering string:

{noformat}
<![CDATA[
{noformat}

I'm working on a fix.

> Received an "java.lang.AssertionError: Attempting to read past the end of a segment."
> -------------------------------------------------------------------------------------
>
>                 Key: SOLR-5983
>                 URL: https://issues.apache.org/jira/browse/SOLR-5983
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 4.7.1
>         Environment: Rhat - running in AWS Large Instance (4processors, 16gb ram) working
in attached storage.
>            Reporter: Dan
>         Attachments: temp.txt
>
>
> I'm hammering on this Solr Instance.  I've got three cores that I'm using to store millions
of small bits of reference data.  I'm using a heavily tweaked Tika to parse xml files and
ingest them into Solr, while referencing this data.  So I'm making hundreds of query requests
against solr, while also making some substantial posts. (I queue up the posts, in general
sending in 100 documents at a time). 
> Stack Trace:
> 4099640 [qtp39890933-24] WARN  org.eclipse.jetty.servlet.ServletHandler  – Error for
/solr/us_patent_gran
> t/update
> java.lang.AssertionError: Attempting to read past the end of a segment.
>         at org.apache.lucene.analysis.charfilter.HTMLStripCharFilter$TextSegment.nextChar(HTMLStripCharFi
> lter.java:30885)
>         at org.apache.lucene.analysis.charfilter.HTMLStripCharFilter.zzDoEOF(HTMLStripCharFilter.java:311
> 50)
>         at org.apache.lucene.analysis.charfilter.HTMLStripCharFilter.nextChar(HTMLStripCharFilter.java:31
> 802)
>         at org.apache.lucene.analysis.charfilter.HTMLStripCharFilter.read(HTMLStripCharFilter.java:30829)
>         at org.apache.lucene.analysis.charfilter.HTMLStripCharFilter.read(HTMLStripCharFilter.java:30842)
       at org.apache.lucene.analysis.standard.std40.StandardTokenizerImpl40.zzRefill(StandardTokenizerImpl40.java:916)
>         at org.apache.lucene.analysis.standard.std40.StandardTokenizerImpl40.getNextToken(StandardTokenizerImpl40.java:1123)
>         at org.apache.lucene.analysis.standard.StandardTokenizer.incrementToken(StandardTokenizer.java:17
> 5)
>         at org.apache.lucene.analysis.payloads.TokenOffsetPayloadTokenFilter.incrementToken(TokenOffsetPa
> yloadTokenFilter.java:45)
>         at org.apache.lucene.analysis.core.LowerCaseFilter.incrementToken(LowerCaseFilter.java:54)
>         at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:182)
>         at org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:248)
>         at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:253)
>         at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:455)
>         at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1534)
>         at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:236)
>         at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:160)
>         at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:
> 69)
>         at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java
> :51)
>         at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProces
> sor.java:704)
>         at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProces
> sor.java:858)
>         at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProces
> sor.java:557)
>         at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:
> 100)
>         at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:247)
>         at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174)
>         at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
>         at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.ja
> va:74)
>         at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>         at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916)
>         at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:780)
>         at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427)
>         at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217)
>         at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message