lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Betsey Benagh <betsey.ben...@stresearch.com>
Subject Re: Integrating grobid with Tika in solr
Date Wed, 04 May 2016 16:41:04 GMT
As a workaround, I’m trying to run Grobid on my files, and then import the
corresponding XML into Solr.

I don’t see any errors on the post:

bba0124$ bin/post -c lrdtest ~/software/grobid/out/021002_1.tei.xml
/Library/Java/JavaVirtualMachines/jdk1.8.0_71.jdk/Contents/Home/bin/java
-classpath /Users/bba0124/software/solr-5.5.0/dist/solr-core-5.5.0.jar
-Dauto=yes -Dc=lrdtest -Ddata=files org.apache.solr.util.SimplePostTool
/Users/bba0124/software/grobid/out/021002_1.tei.xml
SimplePostTool version 5.0.0
Posting files to [base] url http://localhost:8983/solr/lrdtest/update...
Entering auto mode. File endings considered are
xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,r
tf,htm,html,txt,log
POSTing file 021002_1.tei.xml (application/xml) to [base]
1 files indexed.
COMMITting Solr index changes to
http://localhost:8983/solr/lrdtest/update...
Time spent: 0:00:00.027

But the documents don’t seem to show up in the index, either.


Additionally, if I try uploading the documents using the web UI, they
appear to upload successfully,

Response:{
  "responseHeader": {
    "status": 0,
    "QTime": 7
  }
}


But aren’t in the index.

What am I missing?

On 5/4/16, 10:55 AM, "Shawn Heisey" <apache@elyograg.org> wrote:

>On 5/4/2016 8:38 AM, Betsey Benagh wrote:
>> Thanks, I¹m currently using 5.5, and will try upgrading to 6.0.
>>
>>
>> On 5/4/16, 10:37 AM, "Allison, Timothy B." <tallison@mitre.org> wrote:
>>> Y. Solr 6.0.0 is shipping with Tika 1.7.  Grobid came in with Tika
>>>1.11.
>
>Just upgrading to 6.0.0 isn't enough.  As Tim said, Solr 6 currently
>uses Tika 1.7, but 1.11 is required.  That's four minor versions behind
>the minimum.
>
>Tim has filed an issue for upgrading Tika to 1.13 in Solr, which he did
>mention in a previous reply, but I do not know when it will be
>available.  Tim might have a better idea.
>
>https://issues.apache.org/jira/browse/SOLR-8981
>
>You might be able to upgrade Tika in your Solr install to 1.12 yourself
>by simply replacing the jar in WEB-INF/lib ... but I do not know whether
>this will cause any other problems.  Historically, replacing the jar has
>been a safe option ... but I can't guarantee that this will always be
>the case.
>
>Thanks,
>Shawn
>

Mime
View raw message