lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allison, Timothy B." <talli...@mitre.org>
Subject RE: Solr 6.4. Can't index MS Visio vsdx files
Date Fri, 03 Feb 2017 20:31:11 GMT
This is a Tika/POI problem.  Please download tika-app 1.14 [1] or a nightly version of Tika
[2] and run 

java -jar tika-app.jar <your_file.vsdx>

If the problem is fixed, we'll try to upgrade dependencies in Solr.  If it isn't fixed, please
open a bug on Tika's Jira.

If this is a missing bean issue (sorry, I can't tell from your stacktrace which class is missing),
as a temporary workaround, you can rm "poi-ooxml-schemas" and add the full "ooxml-schemas",
and you should be good to go. [3]

Cheers,

          Tim

[1] http://www.apache.org/dyn/closer.cgi/tika/tika-app-1.14.jar 

[2] https://builds.apache.org/job/Tika-trunk/1193/org.apache.tika$tika-app/artifact/org.apache.tika/tika-app/1.15-20170202.203920-124/tika-app-1.15-20170202.203920-124.jar

[3] http://poi.apache.org/faq.html#faq-N10025

-----Original Message-----
From: Alexandre Rafalovitch [mailto:arafalov@gmail.com] 
Sent: Friday, February 3, 2017 9:49 AM
To: solr-user <solr-user@lucene.apache.org>
Subject: Re: Solr 6.4. Can't index MS Visio vsdx files

This kind of information extraction comes from Apache Tika that is shipped with Solr. However
Solr does not ship every possible parser with its installation. So, I think you are hitting
Tika where it manages to figure out what type of content you have, but does not have (Apache
POI - another O/S project) library installed.

What you need to do is to get the additional jar from Tika/POI's project/download and make
it visible to Solr (probably as an extension jar in a lib folder somewhere - I am a bit hazy
on that for latest Solr).

The version of Tika that Solr uses is part of the changes notes. For 6.4, it is https://github.com/apache/lucene-solr/blob/releases/lucene-solr/6.4.0/solr/CHANGES.txt
and it is Tika 1.13

Hope it helps,
   Alex.
----
http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 3 February 2017 at 05:57, Gytis Mikuciunas <gytmkc@gmail.com> wrote:
> Hi,
>
>
> I'm using single core Solr 6.4 instance on windows server (windows 
> server
> 2012 R2 standard),
> Java v8, (build 1.8.0_121-b13).
>
> All works more or less ok, except MS Visio vsdx files indexing.
>
>
> Every time it throws an error (no matters if it tries to index vsdx 
> file or for example docx with visio diagram inside).
>
> Thx in advance for your help. If you need some additional info, please ask.
>
>
> Error/Exception from log:
>
>
>  Null:java.lang.RuntimeException: java.lang.NoClassDefFoundError: 
> Could not initialize class 
> org.apache.poi.xdgf.usermodel.section.geometry.GeometryRowFactory
>         at
> org.apache.poi.xdgf.usermodel.section.GeometrySection.&lt;init&gt;(GeometrySection.java:55)
>         at
> org.apache.poi.xdgf.usermodel.XDGFSheet.&lt;init&gt;(XDGFSheet.java:77)
>         at
> org.apache.poi.xdgf.usermodel.XDGFShape.&lt;init&gt;(XDGFShape.java:113)
>         at
> org.apache.poi.xdgf.usermodel.XDGFShape.&lt;init&gt;(XDGFShape.java:107)
>         at
> org.apache.poi.xdgf.usermodel.XDGFBaseContents.onDocumentRead(XDGFBaseContents.java:82)
>         at
> org.apache.poi.xdgf.usermodel.XDGFMasterContents.onDocumentRead(XDGFMasterContents.java:66)
>         at
> org.apache.poi.xdgf.usermodel.XDGFMasters.onDocumentRead(XDGFMasters.java:101)
>         at
> org.apache.poi.xdgf.usermodel.XmlVisioDocument.onDocumentRead(XmlVisioDocument.java:106)
>         at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:160)
>         at
> org.apache.poi.xdgf.usermodel.XmlVisioDocument.&lt;init&gt;(XmlVisioDocument.java:79)
>         at
> org.apache.poi.xdgf.extractor.XDGFVisioExtractor.&lt;init&gt;(XDGFVisioExtractor.java:41)
>         at
> org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:212)
>         at
> org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:86)
>         at
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:87)
>         at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>         at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>         at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>         at
> org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:72)
>         at
> org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:102)
>         at
> org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedFile(AbstractOOXMLExtractor.java:298)
>         at
> org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedParts(AbstractOOXMLExtractor.java:199)
>         at
> org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:112)
>         at
> org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:112)
>         at
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:87)
>         at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>         at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>         at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>         at
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:228)
>         at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
>         at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:166)
>         at org.apache.solr.core.SolrCore.execute(SolrCore.java:2306)
>         at
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:658)
>         at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464)
>         at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
>         at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:296)
>         at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
>         at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
>         at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>         at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:513)
>         at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
>         at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>         at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>         at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>         at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>         at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>         at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
>         at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
>         at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>         at org.eclipse.jetty.server.Server.handle(Server.java:534)
>         at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
>         at
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>         at
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
>         at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
>         at
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>         at
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
>         at
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
>         at
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
>         at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
>         at
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
>         at java.lang.Thread.run(Unknown Source)
>
>
>
> Regards,
> Gytis
Mime
View raw message