lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ahmet Arslan <iori...@yahoo.com.INVALID>
Subject Re: Text dependent analyzer
Date Wed, 15 Apr 2015 00:47:40 GMT
Hi Hummel,

You can perform sentence detection outside of the solr, using opennlp for instance, and then
feed them to solr.
https://opennlp.apache.org/documentation/1.5.2-incubating/manual/opennlp.html#tools.sentdetect

Ahmet




On Tuesday, April 14, 2015 8:12 PM, Shay Hummel <shay.hummel@gmail.com> wrote:
Hi
I would like to create a text dependent analyzer.
That is, *given a string*, the analyzer will:
1. Read the entire text and break it into sentences.
2. Each sentence will then be tokenized, possesive removal, lowercased,
mark terms and stemmed.

The second part is essentially what happens in english analyzer
(createComponent). However, this is not dependent of the text it receives -
which is the first part of what I am trying to do.

So ... How can it be achieved?

Thank you,

Shay Hummel

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message