lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Bennett <>
Subject Re: Train Lucene with topic-defined files
Date Tue, 17 Jun 2014 16:40:55 GMT

I'm not sure I understand your requirements, but perhaps you could use a Naive Bayes classifier?

Typical Bayes separates into Yes/No (spam detection, etc), but can be extended to N-categories.

Lucene provides access to the words it has indexed in your documents.  You could feed those
to a classifier for training.

A quick Google Search brought this back, perhaps it would get you started:

They also have a KNearestNeighbor version, see the implementers link here:

You might also want to consider Solr, which is a layer on top of Lucene.

Mark Bennett / LucidWorks: Search & Big Data /
Office: 408-898-4201 / Telecommute: 408-733-0387 / Cell: 408-829-6513

On Jun 15, 2014, at 10:37 PM, benglish <> wrote:

> Hi pals,
> I have a huge number of text files with defined tagged topics. What I am
> going to do is to tag the test files due to those pre-tagged files.
> Searching on the Net, I couldn't find my answer: Is it possible to train
> Lucene with tagged files and then it tags test files according to those
> pre-defined tags?
> Yours Sincerely,
> benglish
> --
> View this message in context:
> Sent from the Lucene - General mailing list archive at

View raw message