lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Betsey Benagh <betsey.ben...@stresearch.com>
Subject Integrating grobid with Tika in solr
Date Wed, 04 May 2016 13:15:43 GMT
(X-posted from stack overflow)

This feels like a basic, dumb question, but my reading of the documentation has not led me
to an answer.


i'm using Solr to index journal articles. Using the out-of-the-box configuration, it indexed
the text of the documents, but I'm looking to use Grobid to pull out the authors, title, affiliations,
etc. I got grobid up and running as a service.

I added

<str name="tika.config">/path/to/tika-config.xml</str>

to the requestHandler for /update/extract in solrconfig.xml

The tika-config looks like:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<properties>
  <parsers>
    <parser class="org.apache.tika.parser.journal.JournalParser">
      <mime>application/pdf</mime>
    </parser>
  </parsers>
</properties>


I'm getting a ClassNotFound exception when I try to import a document, but can't figure out
where to set the classpath to fix it.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message