tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Mattmann <mattm...@apache.org>
Subject FW: New answer to "What are the best algorithms for classifying the language of a text snippet? Why?"
Date Thu, 14 Aug 2014 17:44:30 GMT
This seems like a relevant Quora question..



-----Original Message-----
From: Quora <noreply@quora.com>
Date: Thursday, August 14, 2014 7:43 AM
To: Chris Mattmann <chris.mattmann@gmail.com>
Subject: New answer to "What are the best algorithms for classifying the
language of a text snippet? Why?"

>
>    
>        
>            
>        
>            
>         <http://www.quora.com/?__snids__=598736950&__nsrc__=4>
>        
>        
>        
>            
>                New answer to "What are the best algorithms for
>classifying the language of a text snippet? Why?"
>            
>        
>    
>    
>        
>             
>        
>        
>            
>                 
><http://www.quora.com/question/auto_upvote_answer?aid=6517162&key=771db5dc
>b38301c065f91756f1c8259e>
><http://www.quora.com/question/auto_downvote_answer?aid=6517162&key=2140c8
>09df5e8c941e50c73c602a84b8>
>Luis Argerich <http://www.quora.com/Luis-Argerich>, Crazy college
>professor.
>
>Here's a trick that works.
>
>If you are doing this for a website <http://www.quora.com/Websites> then
>measure where the views for that text snippet come from. If they are from
>Japan then your text is Japanese.
>
>It sounds silly but works a lot better than trying to classify the text
>by its contents because many snippets are really difficult to classify as
>they mix different languages, character sets etc. As an example imagine
>you have a webpage with a few links in english and an image with text in
>chinese.
>
>
>
>
>To see the question with all answers, visit:
>  
>http://www.quora.com/What-are-the-best-algorithms-for-classifying-the-lang
>uage-of-a-text-snippet-Why/answer/Luis-Argerich
><http://www.quora.com/What-are-the-best-algorithms-for-classifying-the-lan
>guage-of-a-text-snippet-Why/answer/Luis-Argerich?__snids__=598736950&__nsr
>c__=4>
>
>Thanks,
>The Quora Team
>            
>        
>    
>    
>        
>             
>        
>        
>            
>                We sent you this email because you are following this
>question. To stop following it, click this link:
>http:/​/​www.​quora.​com/​login/​auto_login?.​.​.​
><http://www.quora.com/l/sQaXxTtznsRYsNyI85rNkajYXKXlgdaD8BV7gZFTty8M4b9GZL
>R25lspkMWU27cGJcdcE635ysNU5cFU~wTI~HHfVMtadn5qRljVKmnf4C-MHk-MZZXi0NjMAeiJ
>p0lfPT--nY1TScEkKny7VrkgdwjX4oHj3pkDvpc9cMzXZHvfk~IxsUAa-nx0FSV41w2AJqTHx6
>Bxv1>
>
>To control which emails we send you, visit: Email Settings
><http://www.quora.com/l/jM8mEn8GGQ3Q~sbD4N0bvqcNWbbq2HYPpzaBsXN8THmcfKm45S
>JOBDHYCqfT6ahjuM1YKpHO3w8N9F3~Gh-w~lfuaK2uZc6dIBLQlofiTJPTU4eegGwoCSsk6MXF
>-~1vDwv1>
>
>This email was sent by Quora (650 Castro Street #450, Mountain View, CA
>94041).  Quora is your best source for knowledge. To disable all email
>from Quora, visit the following link: Unsubscribe
><http://www.quora.com/email_optout/qemail_optout?code=59c3e168f041713504bb
>c14ac5202019&type=3&email=chris.mattmann%40gmail.com>.
>
>
>            
>        
>    



Mime
View raw message