lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sweety <sweetyshind...@yahoo.com>
Subject Re: no such field error:smaller big block size details while indexing doc files
Date Wed, 09 Oct 2013 08:48:20 GMT
I will try using solrJ.

Now I tried indexing .docx files and I get some different error,logs are:
SEVERE: null:java.lang.RuntimeException: java.lang.VerifyError: (class: org/apache/poi/extractor/ExtractorFactory,
method: createExtractor signature: (Lorg/apache/poi/poifs/filesystem/DirectoryNode;)Lorg/apache/poi/POITextExtractor;)
Wrong return type in function
at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:651)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:364)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:928)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:539)
at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:298)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.VerifyError: (class: org/apache/poi/extractor/ExtractorFactory, method:
createExtractor signature: (Lorg/apache/poi/poifs/filesystem/DirectoryNode;)Lorg/apache/poi/POITextExtractor;)
Wrong return type in function
at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:59)
at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:82)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219)
at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1797)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343)
... 16 more

But does the jars cause these errors? Because I read one solution which said removal of few
jars in classpath may solve the errors,but those jars are not present in my classpath.(the
link to solution :http://stackoverflow.com/questions/14696371/how-to-extract-the-text-of-a-ppt-file-with-tika)

Thank You.



On Wednesday, October 9, 2013 6:05 AM, Erick Erickson [via Lucene] <ml-node+s472066n4094231h87@n3.nabble.com>
wrote:
 
Hmmm, that is odd, the glob dynamicField should 
pick this up. 

Not quite sure what's going on. You an parse the file 
via Tika yourself and look at what's in there, it's a relatively 
simple SolrJ program, here's a sample: 
http://searchhub.org/2012/02/14/indexing-with-solrj/

Best, 
Erick 

On Tue, Oct 8, 2013 at 4:15 PM, sweety <[hidden email]> wrote: 

> This my new schema.xml: 
> <schema  name="documents"> 
> <fields> 
> <field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false"/>

> <field name="author" type="string" indexed="true" stored="true" multiValued="true"/>

> <field name="comments" type="text" indexed="true" stored="true" multiValued="false"/>

> <field name="keywords" type="text" indexed="true" stored="true" multiValued="false"/>

> <field name="contents" type="text" indexed="true" stored="true" multiValued="false"/>

> <field name="title" type="text" indexed="true" stored="true" multiValued="false"/>

> <field name="revision_number" type="string" indexed="true" stored="true" multiValued="false"/>

> <field name="_version_" type="long" indexed="true" stored="true" multiValued="false"/>

> <dynamicField name="ignored_*" type="string" indexed="false" stored="true" multiValued="true"/>

> <dynamicField name="*" type="ignored"  multiValued="true" /> 
> <copyfield source="id" dest="text" /> 
> <copyfield source="author" dest="text" /> 
> </fields> 
> <types> 
> <fieldtype name="ignored" stored="false" indexed="false" class="solr.StrField" />

> <fieldType name="integer" class="solr.IntField" /> 
> <fieldType name="long" class="solr.LongField" /> 
> <fieldType name="string" class="solr.StrField"  /> 
> <fieldType name="text" class="solr.TextField" /> 
> </types> 
> <uniqueKey>id</uniqueKey> 
> </schema> 
> I still get the same error. 
> 
> ________________________________ 
>  From: Erick Erickson [via Lucene] <[hidden email]> 
> To: sweety <[hidden email]> 
> Sent: Tuesday, October 8, 2013 7:16 AM 
> Subject: Re: no such field error:smaller big block size details while indexing doc files

> 
> 
> 
> Well, one of the attributes parsed out of, probably the 
> meta-information associated with one of your structured 
> docs is SMALLER_BIG_BLOCK_SIZE_DETAILS and 
> Solr Cel is faithfully sending that to your index. If you 
> want to throw all these in the bit bucket, try defining 
> a true catch-all field that ignores things, like this. 
> <dynamicField name="*" type="ignored" multiValued="true" /> 
> 
> Best, 
> Erick 
> 
> On Mon, Oct 7, 2013 at 8:03 AM, sweety <[hidden email]> wrote: 
> 
>> Im trying to index .doc,.docx,pdf files, 
>> im using this url: 
>> curl 
>> "http://localhost:8080/solr/document/update/extract?literal.id=12&commit=true"

>> -F"myfile=@complex.doc" 
>> 
>> This is the error I get: 
>> Oct 07, 2013 5:02:18 PM org.apache.solr.common.SolrException log 
>> SEVERE: null:java.lang.RuntimeException: java.lang.NoSuchFieldError: 
>> SMALLER_BIG_BLOCK_SIZE_DETAILS 
>>         at 
>> org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:651)

>>         at 
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:364)

>>         at 
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)

>>         at 
>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)

>>         at 
>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)

>>         at 
>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)

>>         at 
>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169)

>>         at 
>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168) 
>>         at 
>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) 
>>         at 
>> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:928) 
>>         at 
>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)

>>         at 
>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407) 
>>         at 
>> org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987)

>>         at 
>> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:539)

>>         at 
>> org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:298)

>>         at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

>>         at java.lang.Thread.run(Unknown Source) 
>> Caused by: java.lang.NoSuchFieldError: SMALLER_BIG_BLOCK_SIZE_DETAILS 
>>         at 
>> org.apache.poi.poifs.filesystem.NPOIFSFileSystem.<init>(NPOIFSFileSystem.java:93)

>>         at 
>> org.apache.poi.poifs.filesystem.NPOIFSFileSystem.<init>(NPOIFSFileSystem.java:190)

>>         at 
>> org.apache.poi.poifs.filesystem.NPOIFSFileSystem.<init>(NPOIFSFileSystem.java:184)

>>         at 
>> org.apache.tika.parser.microsoft.POIFSContainerDetector.getTopLevelNames(POIFSContainerDetector.java:376)

>>         at 
>> org.apache.tika.parser.microsoft.POIFSContainerDetector.detect(POIFSContainerDetector.java:165)

>>         at 
>> org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61) 
>>         at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:113)

>>         at 
>> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219)

>>         at 
>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)

>>         at 
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)

>>         at org.apache.solr.core.SolrCore.execute(SolrCore.java:1797) 
>>         at 
>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637) 
>>         at 
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343)

>>         ... 16 more 
>> 
>> Also using same type of url,txt,mp3 and pdf files are indexed successfully. 
>> (curl 
>> "http://localhost:8080/solr/document/update/extract?literal.id=12&commit=true"

>> -F"myfile=@abc.txt") 
>> 
>> Schema.xml is: 
>> <schema  name="documents"> 
>> <fields> 
>> <field name="id" type="string" indexed="true" stored="true" required="true" 
>> multiValued="false"/> 
>> <field name="author" type="string" indexed="true" stored="true" 
>> multiValued="true"/> 
>> <field name="comments" type="text" indexed="true" stored="true" 
>> multiValued="false"/> 
>> <field name="keywords" type="text" indexed="true" stored="true" 
>> multiValued="false"/> 
>> <field name="contents" type="text" indexed="true" stored="true" 
>> multiValued="false"/> 
>> <field name="title" type="text" indexed="true" stored="true" 
>> multiValued="false"/> 
>> <field name="revision_number" type="string" indexed="true" stored="true" 
>> multiValued="false"/> 
>> <field name="_version_" type="long" indexed="true" stored="true" 
>> multiValued="false"/> 
>> 
>> <dynamicField name="ignored_*" type="string" indexed="false" stored="true" 
>> multiValued="true"/> 
>> <copyfield source="id" dest="text" /> 
>> <copyfield source="author" dest="text" /> 
>> </fields> 
>> 
>> <types> 
>> <fieldType name="integer" class="solr.IntField" /> 
>> <fieldType name="long" class="solr.LongField" /> 
>> <fieldType name="string" class="solr.StrField"  /> 
>> <fieldType name="text" class="solr.TextField" /> 
>> <fieldtype name="ignored" stored="false" indexed="false" multiValued="true" 
>> class="solr.StrField" /> 
>> </types> 
>> <uniqueKey>id</uniqueKey> 
>> </schema> 
>> 
>> Im not able to understand what kind of error this is,please help me. 
>> 
>> 
>> 
>> 
>> 
>> 
>> -- 
>> View this message in context: http://lucene.472066.n3.nabble.com/no-such-field-error-smaller-big-block-size-details-while-indexing-doc-files-tp4093883.html
>> Sent from the Solr - User mailing list archive at Nabble.com. 
> 
> 
> ________________________________ 
> 
> If you reply to this email, your message will be added to the discussion below:http://lucene.472066.n3.nabble.com/no-such-field-error-smaller-big-block-size-details-while-indexing-doc-files-tp4093883p4094013.html
> To unsubscribe from no such field error:smaller big block size details while indexing
doc files, click here. 
> NAML 
> 
> 
> 
> -- 
> View this message in context: http://lucene.472066.n3.nabble.com/no-such-field-error-smaller-big-block-size-details-while-indexing-doc-files-tp4093883p4094166.html
> Sent from the Solr - User mailing list archive at Nabble.com. 


________________________________
 
If you reply to this email, your message will be added to the discussion below:http://lucene.472066.n3.nabble.com/no-such-field-error-smaller-big-block-size-details-while-indexing-doc-files-tp4093883p4094231.html

To unsubscribe from no such field error:smaller big block size details while indexing doc
files, click here.
NAML



--
View this message in context: http://lucene.472066.n3.nabble.com/no-such-field-error-smaller-big-block-size-details-while-indexing-doc-files-tp4093883p4094306.html
Sent from the Solr - User mailing list archive at Nabble.com.
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message