lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allison, Timothy B." <talli...@mitre.org>
Subject RE: Solr 6.4. Can't index MS Visio vsdx files
Date Mon, 06 Feb 2017 11:57:42 GMT
Ah, ConnectsType.  That's fixed in the most recent version of POI [1], and will soon be fixed
in Tika [2].  So, no need to open a ticket on Tika's Jira.

> as tika is failing, is it could help or not?

Y, that will absolutely help.  In your Solr contrib/extract/lib directory, you'll see poi-ooxml-schemas-3.xx.jar.
 Remove that jar and add ooxml-schemas.jar [3].  As documented in [4], poi-ooxml-schemas is
a subset of the much larger (complete) ooxml-schemas; ConnectsType was not in the subset,
but it _should_ be in ooxml-schemas.

Cheers,

             Tim



[1] https://bz.apache.org/bugzilla/show_bug.cgi?id=60489
[2] https://issues.apache.org/jira/browse/TIKA-2208 
[3] https://mvnrepository.com/artifact/org.apache.poi/ooxml-schemas/1.3 
[4] http://poi.apache.org/faq.html#faq-N10025 


Hi again,

I've tried with tika-app - didn't help

java -jar tika-app-1.14.jar "I:\Dat\span ports.vsdx"
Exception in thread "main" java.lang.NoClassDefFoundError:
com/microsoft/schemas/office/visio/x2012/main/ConnectsType
        at com.microsoft.schemas.office.visio.x2012.main.impl.
PageContentsTypeImpl.getConnects(Unknown Source)
        at org.apache.poi.xdgf.usermodel.XDGFBaseContents.onDocumentRead(
XDGFBaseContents.java:89)
        at org.apache.poi.xdgf.usermodel.XDGFPageContents.onDocumentRead(
XDGFPageContents.java:73)
        at org.apache.poi.xdgf.usermodel.XDGFPages.onDocumentRead(
XDGFPages.java:94)
        at org.apache.poi.xdgf.usermodel.XmlVisioDocument.onDocumentRead(
XmlVisioDocument.java:108)
        at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:190)
        at org.apache.poi.xdgf.usermodel.XmlVisioDocument.<init>(
XmlVisioDocument.java:79)
        at org.apache.poi.xdgf.extractor.XDGFVisioExtractor.<init>(
XDGFVisioExtractor.java:41)
        at org.apache.poi.extractor.ExtractorFactory.createExtractor(
ExtractorFactory.java:207)
        at org.apache.tika.parser.microsoft.ooxml.
OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:86)
        at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.
parse(OOXMLParser.java:87)
        at org.apache.tika.parser.CompositeParser.parse(
CompositeParser.java:280)
        at org.apache.tika.parser.CompositeParser.parse(
CompositeParser.java:280)
        at org.apache.tika.parser.AutoDetectParser.parse(
AutoDetectParser.java:120)
        at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:191)
        at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:480)
        at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:145)
Caused by: java.lang.ClassNotFoundException: com.microsoft.schemas.office.
visio.x2012.main.ConnectsType
        at java.net.URLClassLoader.findClass(Unknown Source)
        at java.lang.ClassLoader.loadClass(Unknown Source)
        at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
        at java.lang.ClassLoader.loadClass(Unknown Source)
        ... 17 more


So next step is to open bug ticket on tika's jira.


And what about with your proposed workaround?
"If this is a missing bean issue (sorry, I can't tell from your stacktrace which class is
missing), as a temporary workaround, you can rm "poi-ooxml-schemas" and add the full "ooxml-schemas",
and you should be good to go. [3]"

as tika is failing, is it could help or not?

Gytis


On Fri, Feb 3, 2017 at 10:31 PM, Allison, Timothy B. <tallison@mitre.org>
wrote:

> This is a Tika/POI problem.  Please download tika-app 1.14 [1] or a 
> nightly version of Tika [2] and run
>
> java -jar tika-app.jar <your_file.vsdx>
>
> If the problem is fixed, we'll try to upgrade dependencies in Solr.  
> If it isn't fixed, please open a bug on Tika's Jira.
>
> If this is a missing bean issue (sorry, I can't tell from your 
> stacktrace which class is missing), as a temporary workaround, you can 
> rm "poi-ooxml-schemas" and add the full "ooxml-schemas", and you 
> should be good to go. [3]
>
> Cheers,
>
>           Tim
>
> [1] http://www.apache.org/dyn/closer.cgi/tika/tika-app-1.14.jar
>
> [2] https://builds.apache.org/job/Tika-trunk/1193/org.apache.
> tika$tika-app/artifact/org.apache.tika/tika-app/1.15-
> 20170202.203920-124/tika-app-1.15-20170202.203920-124.jar
>
> [3] http://poi.apache.org/faq.html#faq-N10025
>
> -----Original Message-----
> From: Alexandre Rafalovitch [mailto:arafalov@gmail.com]
> Sent: Friday, February 3, 2017 9:49 AM
> To: solr-user <solr-user@lucene.apache.org>
> Subject: Re: Solr 6.4. Can't index MS Visio vsdx files
>
> This kind of information extraction comes from Apache Tika that is 
> shipped with Solr. However Solr does not ship every possible parser 
> with its installation. So, I think you are hitting Tika where it 
> manages to figure out what type of content you have, but does not have 
> (Apache POI - another O/S project) library installed.
>
> What you need to do is to get the additional jar from Tika/POI's 
> project/download and make it visible to Solr (probably as an extension 
> jar in a lib folder somewhere - I am a bit hazy on that for latest Solr).
>
> The version of Tika that Solr uses is part of the changes notes. For 
> 6.4, it is https://github.com/apache/lucene-solr/blob/releases/
> lucene-solr/6.4.0/solr/CHANGES.txt
> and it is Tika 1.13
>
> Hope it helps,
>    Alex.
> ----
> http://www.solr-start.com/ - Resources for Solr users, new and 
> experienced
>
>
> On 3 February 2017 at 05:57, Gytis Mikuciunas <gytmkc@gmail.com> wrote:
> > Hi,
> >
> >
> > I'm using single core Solr 6.4 instance on windows server (windows 
> > server
> > 2012 R2 standard),
> > Java v8, (build 1.8.0_121-b13).
> >
> > All works more or less ok, except MS Visio vsdx files indexing.
> >
> >
> > Every time it throws an error (no matters if it tries to index vsdx 
> > file or for example docx with visio diagram inside).
> >
> > Thx in advance for your help. If you need some additional info, 
> > please
> ask.
> >
> >
> > Error/Exception from log:
> >
> >
> >  Null:java.lang.RuntimeException: java.lang.NoClassDefFoundError:
> > Could not initialize class
> > org.apache.poi.xdgf.usermodel.section.geometry.GeometryRowFactory
> >         at
> > org.apache.poi.xdgf.usermodel.section.GeometrySection.&lt;
> init&gt;(GeometrySection.java:55)
> >         at
> > org.apache.poi.xdgf.usermodel.XDGFSheet.&lt;init&gt;(XDGFSheet.java:77)
> >         at
> > org.apache.poi.xdgf.usermodel.XDGFShape.&lt;init&gt;(XDGFShape.java:113)
> >         at
> > org.apache.poi.xdgf.usermodel.XDGFShape.&lt;init&gt;(XDGFShape.java:107)
> >         at
> > org.apache.poi.xdgf.usermodel.XDGFBaseContents.onDocumentRead(
> XDGFBaseContents.java:82)
> >         at
> > org.apache.poi.xdgf.usermodel.XDGFMasterContents.onDocumentRead(
> XDGFMasterContents.java:66)
> >         at
> > org.apache.poi.xdgf.usermodel.XDGFMasters.onDocumentRead(
> XDGFMasters.java:101)
> >         at
> > org.apache.poi.xdgf.usermodel.XmlVisioDocument.onDocumentRead(
> XmlVisioDocument.java:106)
> >         at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:160)
> >         at
> > org.apache.poi.xdgf.usermodel.XmlVisioDocument.&lt;init&gt;(
> XmlVisioDocument.java:79)
> >         at
> > org.apache.poi.xdgf.extractor.XDGFVisioExtractor.&lt;init&
> gt;(XDGFVisioExtractor.java:41)
> >         at
> > org.apache.poi.extractor.ExtractorFactory.createExtractor(
> ExtractorFactory.java:212)
> >         at
> > org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(
> OOXMLExtractorFactory.java:86)
> >         at
> > org.apache.tika.parser.microsoft.ooxml.OOXMLParser.
> parse(OOXMLParser.java:87)
> >         at
> > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> >         at
> > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> >         at
> > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> >         at
> > org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:72)
> >         at
> > org.apache.tika.extractor.ParsingEmbeddedDocumentExtract
> or.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:102)
> >         at
> > org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.
> handleEmbeddedFile(AbstractOOXMLExtractor.java:298)
> >         at
> > org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.
> handleEmbeddedParts(AbstractOOXMLExtractor.java:199)
> >         at
> > org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHT
> > ML(
> AbstractOOXMLExtractor.java:112)
> >         at
> > org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(
> OOXMLExtractorFactory.java:112)
> >         at
> > org.apache.tika.parser.microsoft.ooxml.OOXMLParser.
> parse(OOXMLParser.java:87)
> >         at
> > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> >         at
> > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> >         at
> > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> >         at
> > org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(
> ExtractingDocumentLoader.java:228)
> >         at
> > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(
> ContentStreamHandlerBase.java:68)
> >         at
> > org.apache.solr.handler.RequestHandlerBase.handleRequest(
> RequestHandlerBase.java:166)
> >         at org.apache.solr.core.SolrCore.execute(SolrCore.java:2306)
> >         at
> > org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:658)
> >         at org.apache.solr.servlet.HttpSolrCall.call(
> HttpSolrCall.java:464)
> >         at
> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilter.java:345)
> >         at
> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilter.java:296)
> >         at
> > org.eclipse.jetty.servlet.ServletHandler$CachedChain.
> doFilter(ServletHandler.java:1691)
> >         at
> > org.eclipse.jetty.servlet.ServletHandler.doHandle(
> ServletHandler.java:582)
> >         at
> > org.eclipse.jetty.server.handler.ScopedHandler.handle(
> ScopedHandler.java:143)
> >         at
> > org.eclipse.jetty.security.SecurityHandler.handle(
> SecurityHandler.java:513)
> >         at
> > org.eclipse.jetty.server.session.SessionHandler.
> doHandle(SessionHandler.java:226)
> >         at
> > org.eclipse.jetty.server.handler.ContextHandler.
> doHandle(ContextHandler.java:1180)
> >         at
> > org.eclipse.jetty.servlet.ServletHandler.doScope(
> ServletHandler.java:512)
> >         at
> > org.eclipse.jetty.server.session.SessionHandler.
> doScope(SessionHandler.java:185)
> >         at
> > org.eclipse.jetty.server.handler.ContextHandler.
> doScope(ContextHandler.java:1112)
> >         at
> > org.eclipse.jetty.server.handler.ScopedHandler.handle(
> ScopedHandler.java:141)
> >         at
> > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(
> ContextHandlerCollection.java:213)
> >         at
> > org.eclipse.jetty.server.handler.HandlerCollection.
> handle(HandlerCollection.java:119)
> >         at
> > org.eclipse.jetty.server.handler.HandlerWrapper.handle(
> HandlerWrapper.java:134)
> >         at org.eclipse.jetty.server.Server.handle(Server.java:534)
> >         at org.eclipse.jetty.server.HttpChannel.handle(
> HttpChannel.java:320)
> >         at
> > org.eclipse.jetty.server.HttpConnection.onFillable(
> HttpConnection.java:251)
> >         at
> > org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(
> AbstractConnection.java:273)
> >         at org.eclipse.jetty.io.FillInterest.fillable(
> FillInterest.java:95)
> >         at
> > org.eclipse.jetty.io.SelectChannelEndPoint$2.run(
> SelectChannelEndPoint.java:93)
> >         at
> > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.
> executeProduceConsume(ExecuteProduceConsume.java:303)
> >         at
> > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.
> produceConsume(ExecuteProduceConsume.java:148)
> >         at
> > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(
> ExecuteProduceConsume.java:136)
> >         at
> > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(
> QueuedThreadPool.java:671)
> >         at
> > org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(
> QueuedThreadPool.java:589)
> >         at java.lang.Thread.run(Unknown Source)
> >
> >
> >
> > Regards,
> > Gytis
>
Mime
View raw message