tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mattmann, Chris A (388J)" <chris.a.mattm...@jpl.nasa.gov>
Subject Re: Pluggable language detection
Date Mon, 09 Apr 2012 01:19:21 GMT
Hi Jan,

It probably makes sense to provide pluggable language detection in Tika, since it's the lower
level library, 
so I am +1 for figuring out a solution to implement it in Tika ville.

If no one has started on this in the next few weeks I'll give it a go.

Cheers,
Chris

On Apr 8, 2012, at 4:16 PM, Jan Høydahl wrote:

> In Solr, we made support for pluggable lang detectors, one being Tika's. See http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/langid/
> The detectLanguage() method returns a list of DetectedLanguage objects with a normalized
certainty between 0.0 and 1.0. Think it's a step in right direction.
> 
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Solr Training - www.solrtraining.com
> 
[...snip...]

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Mime
View raw message