tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris A Mattmann <chris.mattm...@gmail.com>
Subject Re: Pluggable language detection
Date Wed, 21 Mar 2012 19:46:06 GMT
Hey Juls,

I'd be super +1 to make it pluggable and willing to help.

Cheers,
Chris

On Mar 21, 2012, at 4:51 PM, Julien Nioche wrote:

> Hi guys,
> 
> Just wondering about the best way to make the language detection pluggable
> instead of having it hard-wired as it is now. We now that the resources
> that are currently in Tika are both slow and inaccurate [1] and there are
> other libraries that we could leverage. Why not having the option to select
> a different implementation just like we do for parsers? Obviously we'd need
> a common interface for the parsers etc...
> 
> What do you think?
> 
> Julien
> 
> [1]
> http://blog.mikemccandless.com/2011/10/accuracy-and-performance-of-googles.html
> 
> -- 
> *
> *Open Source Solutions for Text Engineering
> 
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
> http://twitter.com/digitalpebble


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Mime
View raw message