tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lewis John Mcgibbney <lewis.mcgibb...@gmail.com>
Subject Re: [nsf-polar-usc-students] Parse-tika plugin with tika (1.7-SNAPSHOT) can't retrieve any parser
Date Mon, 08 Dec 2014 18:43:46 GMT
Hi Angela,

On Mon, Dec 8, 2014 at 12:02 AM, <dev-digest-help@tika.apache.org> wrote:

>
> Following tutorial
>
> https://svn.apache.org/repos/asf/nutch/trunk/src/plugin/parse-tika/howto_upgrade_tika.txt
> ,
> I
> have downloaded the nutch trunk and built the Nutch to use a special tika
> (1
> .7-SNAPSHOT). However, the tika-parser cannot parse any document with the
> error that "Can't retrieve Tika parser for mime-type xxxx".
> If I change the tika version back to the default 1.6. Then the tika-parser
> works. Also, similar to posting
> http://www.mail-archive.com/user%40nutch.apache.org/msg12067.html, this
> problem could be avoided by running Nutch in the Eclipse instead of with
> shell. But anyone knows about the reasons of the problem? And maybe how to
> solve it? Many thanks.
>
>
The short answer is no. I don't know why this behavior results when we use
SNAPSHOT's. It is puzzling.
http://mail-archives.apache.org/mod_mbox/nutch-user/201210.mbox/%3Czarafa.508a8c61.46f9.5675ab8f4c30e9de@mail.openindex.io%3E
I've been aware of unpredictable results like what you are experiencing for
a long time.
This may even have something to do with how Ivy is managing dependencies
within and on behalf of Nutch.
The artifacts we publish within Tika and Maven SNAPSHOT's so there may be a
mismatch there. If this were the case I would not be surprised and it would
not be the first time I've come across this. We need to go DEEP here and
DEBUG right down.
That is all I can suggest, sorry.
Lewis

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message