lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexandre Rafalovitch <arafa...@gmail.com>
Subject Re: Solr 6.4. Can't index MS Visio vsdx files
Date Fri, 03 Feb 2017 14:49:18 GMT
This kind of information extraction comes from Apache Tika that is
shipped with Solr. However Solr does not ship every possible parser
with its installation. So, I think you are hitting Tika where it
manages to figure out what type of content you have, but does not have
(Apache POI - another O/S project) library installed.

What you need to do is to get the additional jar from Tika/POI's
project/download and make it visible to Solr (probably as an extension
jar in a lib folder somewhere - I am a bit hazy on that for latest
Solr).

The version of Tika that Solr uses is part of the changes notes. For
6.4, it is https://github.com/apache/lucene-solr/blob/releases/lucene-solr/6.4.0/solr/CHANGES.txt
and it is Tika 1.13

Hope it helps,
   Alex.
----
http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 3 February 2017 at 05:57, Gytis Mikuciunas <gytmkc@gmail.com> wrote:
> Hi,
>
>
> I'm using single core Solr 6.4 instance on windows server (windows server
> 2012 R2 standard),
> Java v8, (build 1.8.0_121-b13).
>
> All works more or less ok, except MS Visio vsdx files indexing.
>
>
> Every time it throws an error (no matters if it tries to index vsdx file or
> for example docx with visio diagram inside).
>
> Thx in advance for your help. If you need some additional info, please ask.
>
>
> Error/Exception from log:
>
>
>  Null:java.lang.RuntimeException: java.lang.NoClassDefFoundError: Could not
> initialize class
> org.apache.poi.xdgf.usermodel.section.geometry.GeometryRowFactory
>         at
> org.apache.poi.xdgf.usermodel.section.GeometrySection.&lt;init&gt;(GeometrySection.java:55)
>         at
> org.apache.poi.xdgf.usermodel.XDGFSheet.&lt;init&gt;(XDGFSheet.java:77)
>         at
> org.apache.poi.xdgf.usermodel.XDGFShape.&lt;init&gt;(XDGFShape.java:113)
>         at
> org.apache.poi.xdgf.usermodel.XDGFShape.&lt;init&gt;(XDGFShape.java:107)
>         at
> org.apache.poi.xdgf.usermodel.XDGFBaseContents.onDocumentRead(XDGFBaseContents.java:82)
>         at
> org.apache.poi.xdgf.usermodel.XDGFMasterContents.onDocumentRead(XDGFMasterContents.java:66)
>         at
> org.apache.poi.xdgf.usermodel.XDGFMasters.onDocumentRead(XDGFMasters.java:101)
>         at
> org.apache.poi.xdgf.usermodel.XmlVisioDocument.onDocumentRead(XmlVisioDocument.java:106)
>         at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:160)
>         at
> org.apache.poi.xdgf.usermodel.XmlVisioDocument.&lt;init&gt;(XmlVisioDocument.java:79)
>         at
> org.apache.poi.xdgf.extractor.XDGFVisioExtractor.&lt;init&gt;(XDGFVisioExtractor.java:41)
>         at
> org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:212)
>         at
> org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:86)
>         at
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:87)
>         at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>         at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>         at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>         at
> org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:72)
>         at
> org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:102)
>         at
> org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedFile(AbstractOOXMLExtractor.java:298)
>         at
> org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedParts(AbstractOOXMLExtractor.java:199)
>         at
> org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:112)
>         at
> org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:112)
>         at
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:87)
>         at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>         at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>         at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>         at
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:228)
>         at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
>         at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:166)
>         at org.apache.solr.core.SolrCore.execute(SolrCore.java:2306)
>         at
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:658)
>         at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464)
>         at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
>         at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:296)
>         at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
>         at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
>         at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>         at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:513)
>         at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
>         at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>         at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>         at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>         at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>         at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>         at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
>         at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
>         at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>         at org.eclipse.jetty.server.Server.handle(Server.java:534)
>         at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
>         at
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>         at
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
>         at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
>         at
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>         at
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
>         at
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
>         at
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
>         at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
>         at
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
>         at java.lang.Thread.run(Unknown Source)
>
>
>
> Regards,
> Gytis

Mime
View raw message