tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ken Krugler <kkrugler_li...@transpac.com>
Subject Re: Pluggable language detection
Date Wed, 21 Mar 2012 16:55:06 GMT

On Mar 21, 2012, at 8:51am, Julien Nioche wrote:

> Hi guys,
> 
> Just wondering about the best way to make the language detection pluggable
> instead of having it hard-wired as it is now. We now that the resources
> that are currently in Tika are both slow and inaccurate [1] and there are
> other libraries that we could leverage. Why not having the option to select
> a different implementation just like we do for parsers? Obviously we'd need
> a common interface for the parsers etc...
> 
> What do you think?

I'd be more in favor of using that time to integrate a better language detector into Tika,
so that everybody wins from the work :)

-- Ken


> [1]
> http://blog.mikemccandless.com/2011/10/accuracy-and-performance-of-googles.html
> 
> -- 
> *
> *Open Source Solutions for Text Engineering
> 
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
> http://twitter.com/digitalpebble

--------------------------
Ken Krugler
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Mahout & Solr





Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message