tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julien Nioche <lists.digitalpeb...@gmail.com>
Subject Pluggable language detection
Date Wed, 21 Mar 2012 15:51:54 GMT
Hi guys,

Just wondering about the best way to make the language detection pluggable
instead of having it hard-wired as it is now. We now that the resources
that are currently in Tika are both slow and inaccurate [1] and there are
other libraries that we could leverage. Why not having the option to select
a different implementation just like we do for parsers? Obviously we'd need
a common interface for the parsers etc...

What do you think?

Julien

[1]
http://blog.mikemccandless.com/2011/10/accuracy-and-performance-of-googles.html

-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message