lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Betsey Benagh <>
Subject Re: Integrating grobid with Tika in solr
Date Wed, 04 May 2016 16:41:04 GMT
As a workaround, I’m trying to run Grobid on my files, and then import the
corresponding XML into Solr.

I don’t see any errors on the post:

bba0124$ bin/post -c lrdtest ~/software/grobid/out/021002_1.tei.xml
-classpath /Users/bba0124/software/solr-5.5.0/dist/solr-core-5.5.0.jar
-Dauto=yes -Dc=lrdtest -Ddata=files org.apache.solr.util.SimplePostTool
SimplePostTool version 5.0.0
Posting files to [base] url http://localhost:8983/solr/lrdtest/update...
Entering auto mode. File endings considered are
POSTing file 021002_1.tei.xml (application/xml) to [base]
1 files indexed.
COMMITting Solr index changes to
Time spent: 0:00:00.027

But the documents don’t seem to show up in the index, either.

Additionally, if I try uploading the documents using the web UI, they
appear to upload successfully,

  "responseHeader": {
    "status": 0,
    "QTime": 7

But aren’t in the index.

What am I missing?

On 5/4/16, 10:55 AM, "Shawn Heisey" <> wrote:

>On 5/4/2016 8:38 AM, Betsey Benagh wrote:
>> Thanks, I¹m currently using 5.5, and will try upgrading to 6.0.
>> On 5/4/16, 10:37 AM, "Allison, Timothy B." <> wrote:
>>> Y. Solr 6.0.0 is shipping with Tika 1.7.  Grobid came in with Tika
>Just upgrading to 6.0.0 isn't enough.  As Tim said, Solr 6 currently
>uses Tika 1.7, but 1.11 is required.  That's four minor versions behind
>the minimum.
>Tim has filed an issue for upgrading Tika to 1.13 in Solr, which he did
>mention in a previous reply, but I do not know when it will be
>available.  Tim might have a better idea.
>You might be able to upgrade Tika in your Solr install to 1.12 yourself
>by simply replacing the jar in WEB-INF/lib ... but I do not know whether
>this will cause any other problems.  Historically, replacing the jar has
>been a safe option ... but I can't guarantee that this will always be
>the case.

View raw message