nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sanjeev <sanjeev_dasgu...@hotmail.com>
Subject Re: implement thai language indexing and search
Date Tue, 12 Dec 2006 05:39:15 GMT

Hi all,

I am still waiting for some help re: the thai language indexing and
searching.

Please help as i'm quite lost on this one.

Thanks and regards,
sanjeev.


sanjeev wrote:
> 
> Thanks for clearing up some doubts. But exactly how do i wrap it ?
> Do I need to make changes in code to utilize the new thaitokenizer ?
> If yes - where are the places that need modification ? 
> Do I need to download a dev version and do a recompile ?
> 
> Please - if you could possibly tell me the steps - in brief - i would be
> highly obliged.
> 
> Thanks,
> sanjeev.
> 
> 
> 
> 
> Jérôme Charron wrote:
>> 
>>> i used an existing ThaiAnalyzer which was in lucene packlage.
>>> ok - i renamed the lucene.analysis.th.* to nutch.analysis.th.* -
>>> compiled
>>> and
>>> placed all class files in a jar - analysis-th.jar (do i need to bundle
>>> the
>>> ngp file in the jar as well ?)
>> 
>> 1. You don't have to refactor the lucene analyzer. Just to wrap it like I
>> do
>> with french and german analyzers (they both use some analyzers from
>> lucene).
>>  2. Analyzer doesn't need ngp files... I think you misunderstood
>> something:
>> 2.1 In one side there is the language identifier that use NGP files to
>> identify language of a document
>> 2.2 In the other sided if a suitable analyzer is found for the identified
>> language, it is used to analyze the document.
>> 
>> Regards
>> 
>> Jérôme
>> 
>> 
>> -- 
>> http://motrech.free.fr/
>> http://www.frutch.org/
>> 
>> 
> 
> 

-- 
View this message in context: http://www.nabble.com/implement-thai-language-indexing-and-search-tf2641172.html#a7827701
Sent from the Nutch - Dev mailing list archive at Nabble.com.


Mime
View raw message