nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lewis John Mcgibbney <lewis.mcgibb...@gmail.com>
Subject Re: NUTCH-1273
Date Thu, 01 Mar 2012 12:47:36 GMT
Hi Markus,

Well it would appear that the method I mention is the only one which still
uses instances of the deprecated API. I notice that we support Tika-core &
parsers 0.10 in Nutchgora and 1.0 core in trunk. I'll probably just re-open
the relevant issues again assign Nutchgora to them and upgrade to 1.0
removing dependency on tika-parsers in the process (if possible). I notice
that Tika 1.1 is_not_ available on maven central yet, so this is really in
incremental move towards supporting 1.1 as well.

Regarding the code example and options I initially proposed, do you have
any comments on the best route to go down?

Thanks

Lewis

On Wed, Feb 29, 2012 at 2:32 PM, Markus Jelsma
<markus.jelsma@openindex.io>wrote:

> Hmm, i modified the Content and MIMEUtil classes to use the new .detect
> API in
> NUTCH-1230. I was under the impression all deprecated calls were replaced.
>
> On Wednesday 29 February 2012 14:57:46 Lewis John Mcgibbney wrote:
> > Hi,
> >
> > In the process of addressing NUTCH-1273 [0] I ran into a small problem
> with
> > some Tika classes.
> > The patch I attached to the issue currently upgrades usage of external
> > dependencies bar Tika. You will therefore still see javac flagging up
> > problems within o.a.n.util.MimeUtil#autoResolveContentType [1]
> >
> > // if returned null, or if it's the default type then try url resolution
> > 168 if (type == null  169 || (type != null &&
> > type.getName().equals(MimeTypes.OCTET_STREAM))) {  170 // If no mime-type
> > header, or cannot find a corresponding registered  171 // mime-type, then
> > guess a mime-type from the url pattern  172 type =
> > this.mimeTypes.getMimeType(url) != null ? this.mimeTypes  173
> > .getMimeType(url)
> >
> > : type;  174 }
> >
> > Initially I tried changing the above to
> >
> >     // if returned null, or if it's the default type then try url
> > resolution if (type == null
> >
> >         || (type != null && type.getName().equals(
> >
> > MimeTypes.OCTET_STREAM))) {
> >
> >       // If no mime-type header, or cannot find a corresponding
> registered
> >       // mime-type, then guess a mime-type from the url pattern
> >       String mt = tika.detect(url);
> >
> >       type = mt != null ? mt : type;
> >     }
> >
> > However after compiling I get
> >
> >     [javac] MimeUtil.java:165: incompatible types
> >     [javac] found   :
> > java.lang.Object&java.io.Serializable&java.lang.Comparable<? extends
> > java.lang.Object&java.io.Serializable&java.lang.Comparable<?>>
> >     [javac] required: org.apache.tika.mime.MimeType
> >     [javac]       type = mt != null ? mt : type;
> >     [javac]                                ^
> >
> > This is because Tika.detect(URL) returns the mimetype as a String and the
> > detectors themselves return a MediaType.
> >
> > I went to user@tika and the feedback I got was
> >
> > * Switch your code to use a mimetype String
> > * Switch your code to use MediaType rather than MimeType, and call
> >  DefaultDetector directly (rather than using the Tika facade class)
> > * If you get back a String (not null) for the mimetype, create a MimeType
> >  object for it.
> >
> > So I suppose my question is what do we want too do?
> >
> > Thanks
> >
> > [0]
> > https://issues.apache.org/jira/browse/NUTCH-1273<
> https://issues.apache.org
> > /jira/browse/NUTCH-1273> [1]
> >
> http://svn.apache.org/viewvc/nutch/trunk/src/java/org/apache/nutch/util/Mim
> > eUtil.java?view=markup
>
> --
> Markus Jelsma - CTO - Openindex
>



-- 
*Lewis*

Mime
View raw message