lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <erik.hatc...@gmail.com>
Subject Re: Solr crashing while extracting from very simple text file
Date Thu, 01 Apr 2010 09:25:15 GMT
Yes, please report this to the Tika project.

	Erik

On Mar 31, 2010, at 9:31 PM, Ross wrote:

> Does anyone have any thoughts or suggestions on this?  I guess it's
> really a Tika problem. Should I try to report it to the Tika project?
>
> I wonder if someone could try it to see if it's a general problem or
> just me. I can reproduce it by firing up the nano editor, creating a
> file with XXBLE on one line and nothing else. Try indexing that and
> Solr / Tika crashes. I can avoid it by editing the file slightly but I
> haven't really been able to discover a consistent pattern. It works if
> I change the word to lower case. Also a three line file like this
> works
>
> a
> a
> XXBLE
>
> but not
>
> x
> x
> XXBLE
>
> It's a bit unfortunate because a similar word (a person's name ??BLE )
> with the same problem appears frequently in upper case near the top of
> my files.
>
> Cheers
> Ross
>
>
> On Sun, Mar 21, 2010 at 12:58 PM, Ross <tetranz@gmail.com> wrote:
>> Hi all
>>
>> I'm trying to import some text files. I'm mostly following Avi
>> Rappoport's tutorial.  Some of my files cause Solr to crash while
>> indexing. I've narrowed it down to a very simple example.
>>
>> I have a file named test.txt with one line. That line is the word
>> XXBLE and nothing else
>>
>> This is the command I'm using.
>>
>> curl "http://localhost:8080/solr-example/update/extract?literal.id=1&commit=true

>> "
>> -F "myfile=@test.txt"
>>
>> The result is pasted below. Other files work just fine. The problem
>> seems to be related to the letters B and E. If I change them to
>> something else or make them lower case then it works. In my real
>> files, the XX is something else but the result is the same. It's a
>> common word in the files. I guess for this "quick and dirty" job I'm
>> doing I could do a bulk replace in the files to make it lower case.
>>
>> Is there any workaround for this?
>>
>> Thanks
>> Ross
>>
>> <html><head><title>Apache Tomcat/6.0.20 - Error
>> report</title><style><!--H1
>> {font-family:Tahoma,Arial,sans-serif;color:white;background- 
>> color:#525D76;font-size:22px;}
>> H2 {font-family:Tahoma,Arial,sans-serif;color:white;background- 
>> color:#525D76;font-size:16px;}
>> H3 {font-family:Tahoma,Arial,sans-serif;color:white;background- 
>> color:#525D76;font-size:14px;}
>> BODY {font-family:Tahoma,Arial,sans-serif;color:black;background- 
>> color:white;}
>> B {font-family:Tahoma,Arial,sans-serif;color:white;background- 
>> color:#525D76;}
>> P {font-family:Tahoma,Arial,sans- 
>> serif;background:white;color:black;font-size:12px;}A
>> {color : black;}A.name {color : black;}HR {color :
>> #525D76;}--></style> </head><body><h1>HTTP Status 500
-
>> org.apache.tika.exception.TikaException: Unexpected RuntimeException
>> from org.apache.tika.parser.txt.TXTParser@19ccba
>>
>> org.apache.solr.common.SolrException:
>> org.apache.tika.exception.TikaException: Unexpected RuntimeException
>> from org.apache.tika.parser.txt.TXTParser@19ccba
>>        at  
>> org 
>> .apache 
>> .solr 
>> .handler 
>> .extraction 
>> .ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:211)
>>        at  
>> org 
>> .apache 
>> .solr 
>> .handler 
>> .ContentStreamHandlerBase 
>> .handleRequestBody(ContentStreamHandlerBase.java:54)
>>        at  
>> org 
>> .apache 
>> .solr 
>> .handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java: 
>> 131)
>>        at org.apache.solr.core.RequestHandlers 
>> $LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233)
>>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
>>        at  
>> org 
>> .apache 
>> .solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
>>        at  
>> org 
>> .apache 
>> .solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java: 
>> 241)
>>        at  
>> org 
>> .apache 
>> .catalina 
>> .core 
>> .ApplicationFilterChain 
>> .internalDoFilter(ApplicationFilterChain.java:235)
>>        at  
>> org 
>> .apache 
>> .catalina 
>> .core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java: 
>> 206)
>>        at  
>> org 
>> .apache 
>> .catalina 
>> .core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>>        at  
>> org 
>> .apache 
>> .catalina 
>> .core.StandardContextValve.invoke(StandardContextValve.java:191)
>>        at  
>> org 
>> .apache 
>> .catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
>>        at  
>> org 
>> .apache 
>> .catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>>        at  
>> org 
>> .apache 
>> .catalina.core.StandardEngineValve.invoke(StandardEngineValve.java: 
>> 109)
>>        at  
>> org 
>> .apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java: 
>> 293)
>>        at  
>> org 
>> .apache.coyote.http11.Http11Processor.process(Http11Processor.java: 
>> 849)
>>        at org.apache.coyote.http11.Http11Protocol 
>> $Http11ConnectionHandler.process(Http11Protocol.java:583)
>>        at org.apache.tomcat.util.net.JIoEndpoint 
>> $Worker.run(JIoEndpoint.java:454)
>>        at java.lang.Thread.run(Thread.java:636)
>> Caused by: org.apache.tika.exception.TikaException: Unexpected
>> RuntimeException from org.apache.tika.parser.txt.TXTParser@19ccba
>>        at  
>> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java: 
>> 121)
>>        at  
>> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java: 
>> 105)
>>        at  
>> org 
>> .apache 
>> .solr 
>> .handler 
>> .extraction 
>> .ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190)
>>        ... 18 more
>> Caused by: java.lang.NullPointerException
>>        at java.io.Reader.&lt;init&gt;(Reader.java:78)
>>        at java.io.BufferedReader.&lt;init&gt;(BufferedReader.java:93)
>>        at java.io.BufferedReader.&lt;init&gt;(BufferedReader.java: 
>> 108)
>>        at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java: 
>> 59)
>>        at  
>> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java: 
>> 119)
>>        ... 20 more
>> </h1><HR size="1" noshade="noshade"><p><b>type</b>
Status
>> report</p><p><b>message</b>
>> <u>org.apache.tika.exception.TikaException: Unexpected
>> RuntimeException from org.apache.tika.parser.txt.TXTParser@19ccba
>>
>> org.apache.solr.common.SolrException:
>> org.apache.tika.exception.TikaException: Unexpected RuntimeException
>> from org.apache.tika.parser.txt.TXTParser@19ccba
>>        at  
>> org 
>> .apache 
>> .solr 
>> .handler 
>> .extraction 
>> .ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:211)
>>        at  
>> org 
>> .apache 
>> .solr 
>> .handler 
>> .ContentStreamHandlerBase 
>> .handleRequestBody(ContentStreamHandlerBase.java:54)
>>        at  
>> org 
>> .apache 
>> .solr 
>> .handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java: 
>> 131)
>>        at org.apache.solr.core.RequestHandlers 
>> $LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233)
>>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
>>        at  
>> org 
>> .apache 
>> .solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
>>        at  
>> org 
>> .apache 
>> .solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java: 
>> 241)
>>        at  
>> org 
>> .apache 
>> .catalina 
>> .core 
>> .ApplicationFilterChain 
>> .internalDoFilter(ApplicationFilterChain.java:235)
>>        at  
>> org 
>> .apache 
>> .catalina 
>> .core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java: 
>> 206)
>>        at  
>> org 
>> .apache 
>> .catalina 
>> .core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>>        at  
>> org 
>> .apache 
>> .catalina 
>> .core.StandardContextValve.invoke(StandardContextValve.java:191)
>>        at  
>> org 
>> .apache 
>> .catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
>>        at  
>> org 
>> .apache 
>> .catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>>        at  
>> org 
>> .apache 
>> .catalina.core.StandardEngineValve.invoke(StandardEngineValve.java: 
>> 109)
>>        at  
>> org 
>> .apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java: 
>> 293)
>>        at  
>> org 
>> .apache.coyote.http11.Http11Processor.process(Http11Processor.java: 
>> 849)
>>        at org.apache.coyote.http11.Http11Protocol 
>> $Http11ConnectionHandler.process(Http11Protocol.java:583)
>>        at org.apache.tomcat.util.net.JIoEndpoint 
>> $Worker.run(JIoEndpoint.java:454)
>>        at java.lang.Thread.run(Thread.java:636)
>> Caused by: org.apache.tika.exception.TikaException: Unexpected
>> RuntimeException from org.apache.tika.parser.txt.TXTParser@19ccba
>>        at  
>> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java: 
>> 121)
>>        at  
>> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java: 
>> 105)
>>        at  
>> org 
>> .apache 
>> .solr 
>> .handler 
>> .extraction 
>> .ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190)
>>        ... 18 more
>> Caused by: java.lang.NullPointerException
>>        at java.io.Reader.&lt;init&gt;(Reader.java:78)
>>        at java.io.BufferedReader.&lt;init&gt;(BufferedReader.java:93)
>>        at java.io.BufferedReader.&lt;init&gt;(BufferedReader.java: 
>> 108)
>>        at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java: 
>> 59)
>>        at  
>> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java: 
>> 119)
>>        ... 20 more
>> </u></p><p><b>description</b> <u>The server encountered
an internal
>> error (org.apache.tika.exception.TikaException: Unexpected
>> RuntimeException from org.apache.tika.parser.txt.TXTParser@19ccba
>>
>> org.apache.solr.common.SolrException:
>> org.apache.tika.exception.TikaException: Unexpected RuntimeException
>> from org.apache.tika.parser.txt.TXTParser@19ccba
>>        at  
>> org 
>> .apache 
>> .solr 
>> .handler 
>> .extraction 
>> .ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:211)
>>        at  
>> org 
>> .apache 
>> .solr 
>> .handler 
>> .ContentStreamHandlerBase 
>> .handleRequestBody(ContentStreamHandlerBase.java:54)
>>        at  
>> org 
>> .apache 
>> .solr 
>> .handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java: 
>> 131)
>>        at org.apache.solr.core.RequestHandlers 
>> $LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233)
>>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
>>        at  
>> org 
>> .apache 
>> .solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
>>        at  
>> org 
>> .apache 
>> .solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java: 
>> 241)
>>        at  
>> org 
>> .apache 
>> .catalina 
>> .core 
>> .ApplicationFilterChain 
>> .internalDoFilter(ApplicationFilterChain.java:235)
>>        at  
>> org 
>> .apache 
>> .catalina 
>> .core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java: 
>> 206)
>>        at  
>> org 
>> .apache 
>> .catalina 
>> .core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>>        at  
>> org 
>> .apache 
>> .catalina 
>> .core.StandardContextValve.invoke(StandardContextValve.java:191)
>>        at  
>> org 
>> .apache 
>> .catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
>>        at  
>> org 
>> .apache 
>> .catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>>        at  
>> org 
>> .apache 
>> .catalina.core.StandardEngineValve.invoke(StandardEngineValve.java: 
>> 109)
>>        at  
>> org 
>> .apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java: 
>> 293)
>>        at  
>> org 
>> .apache.coyote.http11.Http11Processor.process(Http11Processor.java: 
>> 849)
>>        at org.apache.coyote.http11.Http11Protocol 
>> $Http11ConnectionHandler.process(Http11Protocol.java:583)
>>        at org.apache.tomcat.util.net.JIoEndpoint 
>> $Worker.run(JIoEndpoint.java:454)
>>        at java.lang.Thread.run(Thread.java:636)
>> Caused by: org.apache.tika.exception.TikaException: Unexpected
>> RuntimeException from org.apache.tika.parser.txt.TXTParser@19ccba
>>        at  
>> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java: 
>> 121)
>>        at  
>> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java: 
>> 105)
>>        at  
>> org 
>> .apache 
>> .solr 
>> .handler 
>> .extraction 
>> .ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190)
>>        ... 18 more
>> Caused by: java.lang.NullPointerException
>>        at java.io.Reader.&lt;init&gt;(Reader.java:78)
>>        at java.io.BufferedReader.&lt;init&gt;(BufferedReader.java:93)
>>        at java.io.BufferedReader.&lt;init&gt;(BufferedReader.java: 
>> 108)
>>        at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java: 
>> 59)
>>        at  
>> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java: 
>> 119)
>>        ... 20 more
>> ) that prevented it from fulfilling this request.</u></p><HR size="1"
>> noshade="noshade"><h3>Apache Tomcat/6.0.20</h3></body></html>
>>


Mime
View raw message