lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koji Sekiguchi <k...@r.email.ne.jp>
Subject Re: Train Lucene with topic-defined files
Date Sun, 22 Jun 2014 14:43:28 GMT
Hi benglish,

I see your point. You haven't got an index yet. All you have to do first is
create a Lucene index. When creating it, don't worry about training.

To create an index, please take a look at this:

http://lucene.apache.org/core/4_7_2/core/overview-summary.html#overview_description

Once you've got an index, then you can call train() (at the out of the loop, of course).

Koji
-- 
http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html


(2014/06/22 23:14), benglish wrote:
> Dear Koji,
> Since I am newbie to Lucene, I still have no opinion about the .xml file you
> have talked about in your post unfortunately!!!
> Let's imagine I have 5 categories named {A, B, C, D, E} and 100 files named
> from 1 to 100. It is impossible in my case to train the classifier out of a
> loop, because I should extract the content of each file and its category and
> then add it to the training set. So it must be in a loop. Could you please
> tell me if I am right with the following pseudocode:
>
> directory = directory of training files
> trainingNumber = number of training files
> for(int i = 0; i < trainingNumber; i++)
> {
>      String category = category of ith file
>      String text = content of ith file
>      classifier.train(ar, text, category, new
> SomeAnalyzer(Version.LUCENE_46));
> }
>
> If it is wrong, please let me know how I should train the classifier outside
> the loop
>
> Yours Sincerely,
> benglish
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Train-Lucene-with-topic-defined-files-tp4141979p4143318.html
> Sent from the Lucene - General mailing list archive at Nabble.com.
>




Mime
View raw message