tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oleg Tikhonov <o...@apache.org>
Subject Re: Having Problem in Word Count and Language Detaction
Date Sat, 26 Oct 2013 19:05:33 GMT
Hi Animesh,
my wild guess is that N-gram profile for Chinese wasn't trained pretty
well. Try recreate Chinese language profile.

Have a look here:
http://www.ibm.com/developerworks/opensource/tutorials/os-apache-tika/section6.html

Hope it helps.


On Sat, Oct 26, 2013 at 8:48 PM, Chris Mattmann <mattmann@apache.org> wrote:

> Hi Animesh,
>
> Please detail your issue here on dev@tika.apache.org and I'm sure
> someone can help.
>
> Cheers,
> Chris
>
>
> -----Original Message-----
> From: Animesh Kumar <animesh.sarag@gmail.com>
> Date: Wednesday, October 23, 2013 9:15 PM
> To: "dev-owner@tika.apache.org" <dev-owner@tika.apache.org>
> Subject: Fwd: Having Problem in Word Count and Language Detaction
>
> >
> >
> >Sir/Mam,
> >I am developing a web based software which use Apache Tika for getting
> >Language and words Count of Uploaded file. Its working fine for English,
> >Japanese , Hindi etc but giving wrong words count for Chinese. I am using
> >tika-app-1.4.jar .
> >and there is an another problem in word counting of file format different
> >from doc and docx
> >
> >
> >--
> >With Thanks & Regards
> >Animesh Kumar
> >+918927992397 <tel:%2B918927992397>
> >
> >
> >
> >
> >
> >
> >
> >--
> >With Thanks & Regards
> >Animesh Kumar
> >+918927992397 <tel:%2B918927992397>
> >
> >
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message