lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Rowe (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SOLR-5983) HTMLStripCharFilter is treating CDATA sections incorrectly
Date Thu, 17 Apr 2014 05:11:15 GMT

     [ https://issues.apache.org/jira/browse/SOLR-5983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Steve Rowe updated SOLR-5983:
-----------------------------

    Summary: HTMLStripCharFilter is treating CDATA sections incorrectly  (was: Received an
"java.lang.AssertionError: Attempting to read past the end of a segment.")

> HTMLStripCharFilter is treating CDATA sections incorrectly
> ----------------------------------------------------------
>
>                 Key: SOLR-5983
>                 URL: https://issues.apache.org/jira/browse/SOLR-5983
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 4.7.1
>         Environment: Rhat - running in AWS Large Instance (4processors, 16gb ram) working
in attached storage.
>            Reporter: Dan
>         Attachments: temp.txt
>
>
> I'm hammering on this Solr Instance.  I've got three cores that I'm using to store millions
of small bits of reference data.  I'm using a heavily tweaked Tika to parse xml files and
ingest them into Solr, while referencing this data.  So I'm making hundreds of query requests
against solr, while also making some substantial posts. (I queue up the posts, in general
sending in 100 documents at a time). 
> Stack Trace:
> 4099640 [qtp39890933-24] WARN  org.eclipse.jetty.servlet.ServletHandler  – Error for
/solr/us_patent_gran
> t/update
> java.lang.AssertionError: Attempting to read past the end of a segment.
>         at org.apache.lucene.analysis.charfilter.HTMLStripCharFilter$TextSegment.nextChar(HTMLStripCharFi
> lter.java:30885)
>         at org.apache.lucene.analysis.charfilter.HTMLStripCharFilter.zzDoEOF(HTMLStripCharFilter.java:311
> 50)
>         at org.apache.lucene.analysis.charfilter.HTMLStripCharFilter.nextChar(HTMLStripCharFilter.java:31
> 802)
>         at org.apache.lucene.analysis.charfilter.HTMLStripCharFilter.read(HTMLStripCharFilter.java:30829)
>         at org.apache.lucene.analysis.charfilter.HTMLStripCharFilter.read(HTMLStripCharFilter.java:30842)
       at org.apache.lucene.analysis.standard.std40.StandardTokenizerImpl40.zzRefill(StandardTokenizerImpl40.java:916)
>         at org.apache.lucene.analysis.standard.std40.StandardTokenizerImpl40.getNextToken(StandardTokenizerImpl40.java:1123)
>         at org.apache.lucene.analysis.standard.StandardTokenizer.incrementToken(StandardTokenizer.java:17
> 5)
>         at org.apache.lucene.analysis.payloads.TokenOffsetPayloadTokenFilter.incrementToken(TokenOffsetPa
> yloadTokenFilter.java:45)
>         at org.apache.lucene.analysis.core.LowerCaseFilter.incrementToken(LowerCaseFilter.java:54)
>         at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:182)
>         at org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:248)
>         at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:253)
>         at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:455)
>         at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1534)
>         at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:236)
>         at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:160)
>         at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:
> 69)
>         at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java
> :51)
>         at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProces
> sor.java:704)
>         at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProces
> sor.java:858)
>         at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProces
> sor.java:557)
>         at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:
> 100)
>         at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:247)
>         at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174)
>         at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
>         at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.ja
> va:74)
>         at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>         at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916)
>         at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:780)
>         at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427)
>         at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217)
>         at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message