lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Betsey Benagh <betsey.ben...@stresearch.com>
Subject Re: Integrating grobid with Tika in solr
Date Wed, 04 May 2016 14:38:16 GMT
Thanks, I¹m currently using 5.5, and will try upgrading to 6.0.


On 5/4/16, 10:37 AM, "Allison, Timothy B." <tallison@mitre.org> wrote:

>Y. Solr 6.0.0 is shipping with Tika 1.7.  Grobid came in with Tika 1.11.
>
>-----Original Message-----
>From: Allison, Timothy B. [mailto:tallison@mitre.org]
>Sent: Wednesday, May 4, 2016 10:29 AM
>To: solr-user@lucene.apache.org
>Subject: RE: Integrating grobid with Tika in solr
>
>I think Solr is using a version of Tika that predates that addition of
>the Grobid parser.  You'll have to add that manually somehow until Solr
>upgrades to Tika 1.13 (soon to be released...I think).  SOLR-8981.
>
>-----Original Message-----
>From: Betsey Benagh [mailto:betsey.benagh@stresearch.com]
>Sent: Wednesday, May 4, 2016 10:07 AM
>To: solr-user@lucene.apache.org
>Subject: Re: Integrating grobid with Tika in solr
>
>Grobid runs as a service, and I'm (theoretically) configuring Tika to
>call it.
>
>From the Grobid wiki, here are instructions for integrating with Tika
>application:
>
>First we need to create the GrobidExtractor.properties file that points
>to the Grobid REST Service. My file looks like the following:
>
>grobid.server.url=http://localhost:[port]
>
>Now you can run GROBID via Tika-app with the following command on a
>sample PDF file.
>
>java -classpath 
>$HOME/src/grobidparser-resources/:tika-app-1.11-SNAPSHOT.jar
>org.apache.tika.cli.TikaCLI
>--config=$HOME/src/grobidparser-resources/tika-config.xml -J
>$HOME/src/grobid/papers/ICSE06.pdf
>
>Here's the stack trace.
>
><lst name="error"><lst name="metadata"><str
>name="error-class">org.apache.solr.common.SolrException</str><str
>name="root-error-class">java.lang.ClassNotFoundException</str></lst><str
>name="msg">org.apache.tika.exception.TikaException: Unable to find a
>parser class: org.apache.tika.parser.journal.JournalParser</str><str
>name="trace">org.apache.solr.common.SolrException:
>org.apache.tika.exception.TikaException: Unable to find a parser class:
>org.apache.tika.parser.journal.JournalParser
>at 
>org.apache.solr.handler.extraction.ExtractingRequestHandler.inform(Extract
>ingRequestHandler.java:82)
>at 
>org.apache.solr.core.PluginBag$LazyPluginHolder.createInst(PluginBag.java:
>367)
>at org.apache.solr.core.PluginBag$LazyPluginHolder.get(PluginBag.java:348)
>at org.apache.solr.core.PluginBag.get(PluginBag.java:148)
>at 
>org.apache.solr.handler.RequestHandlerBase.getRequestHandler(RequestHandle
>rBase.java:231)
>at org.apache.solr.core.SolrCore.getRequestHandler(SolrCore.java:1362)
>at 
>org.apache.solr.servlet.HttpSolrCall.extractHandlerFromURLPath(HttpSolrCal
>l.java:326)
>at org.apache.solr.servlet.HttpSolrCall.init(HttpSolrCall.java:296)
>at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:412)
>at 
>org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.jav
>a:225)
>at 
>org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.jav
>a:183)
>at 
>org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandl
>er.java:1652)
>at 
>org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
>at 
>org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:1
>43)
>at 
>org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577
>)
>at 
>org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.ja
>va:223)
>at 
>org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.ja
>va:1127)
>at 
>org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
>at 
>org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.jav
>a:185)
>at 
>org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.jav
>a:1061)
>at 
>org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:1
>41)
>at 
>org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHa
>ndlerCollection.java:215)
>at 
>org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollectio
>n.java:110)
>at 
>org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java
>:97)
>at org.eclipse.jetty.server.Server.handle(Server.java:499)
>at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
>at 
>org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257
>)
>at 
>org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
>at 
>org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.jav
>a:635)
>at 
>org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java
>:555)
>at java.lang.Thread.run(Thread.java:745)
>Caused by: org.apache.tika.exception.TikaException: Unable to find a
>parser class: org.apache.tika.parser.journal.JournalParser
>at 
>org.apache.tika.config.TikaConfig.parserFromDomElement(TikaConfig.java:362
>)
>at org.apache.tika.config.TikaConfig.&lt;init&gt;(TikaConfig.java:127)
>at org.apache.tika.config.TikaConfig.&lt;init&gt;(TikaConfig.java:115)
>at org.apache.tika.config.TikaConfig.&lt;init&gt;(TikaConfig.java:111)
>at org.apache.tika.config.TikaConfig.&lt;init&gt;(TikaConfig.java:92)
>at 
>org.apache.solr.handler.extraction.ExtractingRequestHandler.inform(Extract
>ingRequestHandler.java:80)
>... 30 more
>Caused by: java.lang.ClassNotFoundException:
>org.apache.tika.parser.journal.JournalParser
>at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:814)
>at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>at java.lang.Class.forName0(Native Method) at
>java.lang.Class.forName(Class.java:348)
>at 
>org.apache.tika.config.ServiceLoader.getServiceClass(ServiceLoader.java:18
>9)
>at 
>org.apache.tika.config.TikaConfig.parserFromDomElement(TikaConfig.java:338
>)
>... 35 more
></str><int name="code">500</int></lst>
>
>
>
>On 5/4/16, 10:00 AM, "Shawn Heisey"
><apache@elyograg.org<mailto:apache@elyograg.org>> wrote:
>
>On 5/4/2016 7:15 AM, Betsey Benagh wrote:
>(X-posted from stack overflow)
>This feels like a basic, dumb question, but my reading of the
>documentation has not led me to an answer.
>i'm using Solr to index journal articles. Using the out-of-the-box
>configuration, it indexed the text of the documents, but I'm looking to
>use Grobid to pull out the authors, title, affiliations, etc. I got
>grobid up and running as a service.
>I added
><str name="tika.config">/path/to/tika-config.xml</str>
>to the requestHandler for /update/extract in solrconfig.xml The
>tika-config looks like:
><?xml version="1.0" encoding="UTF-8" standalone="no"?> <properties>
>   <parsers>
>     <parser class="org.apache.tika.parser.journal.JournalParser">
>       <mime>application/pdf</mime>
>     </parser>
>   </parsers>
></properties>
>I'm getting a ClassNotFound exception when I try to import a document,
>but can't figure out where to set the classpath to fix it.
>
>I do not know anything about grobid.
>
>We'll need to see the exception -- the entire multi-line stacktrace,
>including any "caused by" sections.
>
>In general, you should create a lib directory in the solr home and place
>all extra jars in that directory.  Otherwise you need <lib> elements in
>solrconfig.xml to load jars -- and they will be loaded once for every
>core that uses that <lib> element.  ${solr.solr.home}/lib loads jars
>*once* when Solr starts and makes them available to all cores.
>
>Thanks,
>Shawn
>
>


Mime
View raw message