We do have EdgeNGramTokenizer if that is what you are after.
See how Solr uses it here:
http://search-lucene.com/c/Solr:/src/java/org/apache/solr/analysis/EdgeNGramTokenizerFactory.java||EdgeNGramTokenizer
Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
----- Original Message ----
> From: Clemens Wyss <clemensdev@mysign.ch>
> To: "java-user@lucene.apache.org" <java-user@lucene.apache.org>
> Sent: Wed, May 4, 2011 2:07:40 AM
> Subject: AW: AW: AW: AW: "fuzzy prefix" search
>
> I know this is just an example.
> But even the WhitespaceAnalyzer takes the words apart, which I don't want. I
>would like the phrases as they are (maximum 3 words, e.g. "Merlot del Ticino",
>...) to be n-gram-ed. I hence want to have the n-grams.
> Mer
> Merl
> Merlo
> Merlot
> Merlot
> Merlot d
> ...
>
> Regards
> Clemens
> > -----Ursprüngliche Nachricht-----
> > Von: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
> > Gesendet: Dienstag, 3. Mai 2011 23:12
> > An: java-user@lucene.apache.org
> > Betreff: Re: AW: AW: AW: "fuzzy prefix" search
> >
> > Clemens - that's just an example. Stick another tokenizer in there, like
> > WhitespaceTokenizer in there, for example.
> >
> > Otis
> > ----
> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem
> > search :: http://search-lucene.com/
> >
> >
> >
> > ----- Original Message ----
> > > From: Clemens Wyss <clemensdev@mysign.ch>
> > > To: "java-user@lucene.apache.org" <java-user@lucene.apache.org>
> > > Sent: Tue, May 3, 2011 4:31:14 PM
> > > Subject: AW: AW: AW: "fuzzy prefix" search
> > >
> > > But doesn't the KeyWordTokenizer extract single words out oft he
> > >stream? I would like to create n-grams on the stream (field content) as
it
> > is...
> > >
> > > > -----Ursprüngliche Nachricht-----
> > > > Von: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
> > > > Gesendet: Dienstag, 3. Mai 2011 21:31
> > > > An: java-user@lucene.apache.org
> > > > Betreff: Re: AW: AW: "fuzzy prefix" search
> > > >
> > > > Clemens,
> > > >
> > > > Something a la:
> > > >
> > > > public TokenStream tokenStream (String fieldName, Reader r) {
> > > > return nw EdgeNGramTokenFilter(new KeywordTokenizer(r),
> > > > EdgeNGramTokenFilter.Side.FRONT, 1, 4); }
> > > >
> > > >
> > > > Check out page 265 of Lucene in Action 2.
> > > >
> > > > Otis
> > > > ----
> > > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene
> > > > ecosystem search :: http://search-lucene.com/
> > > >
> > > >
> > > >
> > > > ----- Original Message ----
> > > > > From: Clemens Wyss <clemensdev@mysign.ch>
> > > > > To: "java-user@lucene.apache.org" <java-user@lucene.apache.org>
> > > > > Sent: Tue, May 3, 2011 12:57:39 PM
> > > > > Subject: AW: AW: "fuzzy prefix" search
> > > > >
> > > > > How does an simple Analyzer look that just "n-grams" the
>docs/fields.
> > > > >
> > > > > class SimpleNGramAnalyzer extends Analyzer { @Override
> > > > > public TokenStream tokenStream ( String fieldName, Reader reader
)
> > > > > {
> > > > > EdgeNGramTokenFilter... ???
> > > > > }
> > > > > }
> > > > >
> > > > > > -----Ursprüngliche Nachricht-----
> > > > > > Von: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
> > > > > > Gesendet: Dienstag, 3. Mai 2011 13:36
> > > > > > An: java-user@lucene.apache.org
> > > > > > Betreff: Re: AW: "fuzzy prefix" search
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I didn't read this thread closely, but just in case:
> > > > > > * Is this something you can handle with synonyms?
> > > > > > * If this is for English and you are trying to handle typos,
> > > > > > there is a
> > >list
> > > > >of
> > > > > > common English misspellings out there that you could use
for
> > > > > > this
> > > > perhaps.
> > > > > > * Have you considered n-gramming your tokens? Not sure
if
> > > > > > this would
> > > > help,
> > > > > > didn't read messages/examples closely enough, but you may
want
> > > > > > to
> > > > look at
> > > > > > this if you haven't done so yet.
> > > > > >
> > > > > > Otis
> > > > > > ----
> > > > > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> > > > Lucene ecosystem
> > > > > > search :: http://search-lucene.com/
> > > > > >
> > > > > >
> > > > > >
> > > > > > ----- Original Message ----
> > > > > > > From: Clemens Wyss <clemensdev@mysign.ch>
> > > > > > > To: "java-user@lucene.apache.org" <java-
> > user@lucene.apache.org>
> > > > > > > Sent: Tue, May 3, 2011 5:25:30 AM
> > > > > > > Subject: AW: "fuzzy prefix" search
> > > > > > >
> > > > > > > >PrefixQuery
> > > > > > > I'd like the combination of prefix and fuzzy ;-) because
> > > > > > > people
> > >could
> > > > > > >also type "menlo" or "märl" and in any of these cases
I'd
> > > > > > like to
> > >get
> > > > > > >a hit on Merlot (for suggesting Merlot)
> > > > > > >
> > > > > > > > -----Ursprüngliche Nachricht-----
> > > > > > > > Von: Ian Lea [mailto:ian.lea@gmail.com]
> > > > > > > > Gesendet: Dienstag, 3. Mai 2011 11:22 > An:
> > > > > > > java-user@lucene.apache.org
> > > > > > > > Betreff: Re: "fuzzy prefix" search
> > > > > > > >
> > > > > > > > I'd assumed that FuzzyQuery wouldn't ignore case
but I
> > > > > > could be
> > > > wrong.
> > > > > > > > What would be the edit distance between "mer"
and
>"merlot"?
> > > > Would
> > > > > > > > it be less that 1.5 which I reckon would be the
value of
> > > > > > > > length(term)*0.5 as detailed in the javadocs?
Seems
> > > > > > > > unlikely,
> > >but
> > > > > > > > I don't really know anything about the Levenshtein
(edit
> > distance)
> > > > > > algorithm as used by FuzzyQuery.
> > > > > > > > Wouldn't a PrefixQuery be more appropriate here?
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Ian.
> > > > > > > >
> > > > > > > > On Tue, May 3, 2011 at 10:10 AM, Clemens Wyss
> > > > > > > > <clemensdev@mysign.ch>
> > > > > > > > wrote:
> > > > > > > > > Unfortunately lowercasing doesn't help.
> > > > > > > > > Also, doesn't the FuzzyQuery ignore casing?
> > > > > > > > >
> > > > > > > > >> -----Ursprüngliche Nachricht-----
> > > > > > > > >> Von: Ian Lea [mailto:ian.lea@gmail.com]
> > > > > > > > >> Gesendet: Dienstag, 3. Mai 2011 11:06
> > > > > > > > >> An: java-user@lucene.apache.org
> > > > > > > > >> Betreff: Re: "fuzzy prefix" search
> > > > > > > > >>
> > > > > > > > >> Mer != mer. The latter will be what
is indexed
> > > > > > > > because
> > > > > > > > >> StandardAnalyzer calls LowerCaseFilter.
> > > > > > > > >>
> > > > > > > > >> --
> > > > > > > > >> Ian.
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> On Tue, May 3, 2011 at 9:56 AM, Clemens
Wyss
> > > > > > > > <clemensdev@mysign.ch>
> > > > > > > > >> wrote:
> > > > > > > > >> > Sorry for coming back to my issue.
Can anybody
> > > > > > > > >> explain why
> > >my
> > > > > > > > "simple"
> > > > > > > > >> unit test below fails? Any hint/help
appreciated.
> > > > > > > > >> >
> > > > > > > > >> > Directory directory = new RAMDirectory();
> > > > > > > > >> IndexWriter
> > > > > > > > >> > indexWriter = new IndexWriter(
directory, new
> > > > > > > > >> > StandardAnalyzer(
> > > > > > > > Version.LUCENE_31
> > > > > > > > >> > ), IndexWriter.MaxFieldLength.UNLIMITED
); Document
> > > > document
> > > > > > =
> > > > > > > > new
> > > > > > > > >> > Document(); document.add( new Field(
"test", "Merlot",
> > > > > > > > >> > Field.Store.YES, Field.Index.ANALYZED
) );
> > > > > > > > >> > indexWriter.addDocument(
> > > > > > > > >> > document ); IndexReader indexReader
=
> > > > > > > > indexWriter.getReader();
> > > > > > > > >> > IndexSearcher searcher = new IndexSearcher(
> > > > > > indexReader );
> > > > > > > > >> > Query q = new FuzzyQuery( new Term(
"test", "Mer" ),
> > 0.5f,
> > >0,
> > > > > > > > >> > 10 ); // or Query q = new FuzzyQuery(
new Term(
> > > > > > > > >> > "test",
> > "Mer"
> > > > > > > > >> > ), 0.5f); TopDocs result = searcher.search(
q, 10
> > > > > );
> > > > > > > > >> > Assert.assertEquals( 1, result.totalHits
);
> > > > > > > > >> >
> > > > > > > > >> > - Clemens
> > > > > > > > >> >
> > > > > > > > >> >> -----Ursprüngliche Nachricht-----
> > > > > > > > >> >> Von: Clemens Wyss [mailto:clemensdev@mysign.ch]
> > > > > > > > >> >> Gesendet: Montag, 2. Mai 2011
23:01
> > > > > > > > >> >> An: java-user@lucene.apache.org
> > > > > > > > >> >> Betreff: AW: "fuzzy prefix"
search
> > > > > > > > >> >>
> > > > > > > > >> >> Is it the combination of FuzzyQuery
and Term which
> > > > > makes
> > >the
> > > > > > > > >> >> search to go for "word boundaries"?
> > > > > > > > >> >>
> > > > > > > > >> >> > -----Ursprüngliche Nachricht-----
> > > > > > > > >> >> > Von: Clemens Wyss [mailto:clemensdev@mysign.ch]
> > > > > > > > >> >> > Gesendet: Montag, 2. Mai
2011 14:13
> > > > > > > > >> >> > An: java-user@lucene.apache.org
> > > > > > > > >> >> > Betreff: AW: "fuzzy prefix"
search
> > > > > > > > >> >> >
> > > > > > > > >> >> > I tried this too, but unfortunately
I only get
> > > > > > > > >> hits when
> > > > > > > > >> >> > the search term is a least
as long as the word to
>be
> > >looked
> > > > up.
> > > > > > > > >> >> >
> > > > > > > > >> >> > E.g.:
> > > > > > > > >> >> > ...
> > > > > > > > >> >> > Directory directory = new
RAMDirectory();
>IndexWriter
> > > > > > > > >> >> > indexWriter = new IndexWriter(
directory, >> >>
> > > > > > > > > IndexManager.getIndexingAnalyzer(
> > > > > > > > >> >> LOCALE_DE ),
> > > > > > > > >> >> > IndexWriter.MaxFieldLength.UNLIMITED
>);
> > > > > > > > >> >> >
> > > > > > > > >> >> > Document document = new
Document();
> > document.add(
> > > > new
> > > > > > > > Field(
> > > > > > > > >> >> > "test", "Merlot",
> > > > > > > > >> >> > Field.Store.YES,
Field.Index.ANALYZED
>) );
> > > > > > > > >> >> indexWriter.addDocument(
> > > > > > > > >> >> > document );
> > > > > > > > >> >> >
> > > > > > > > >> >> > IndexReader indexReader
= indexWriter.getReader();
> > > > > > > > >> >> > IndexSearcher
> > > > > > > > >> >> > searcher = new IndexSearcher(
indexReader ); >>
> > > > > > > >> >
> > > > > > > > >> >> > Query q = new FuzzyQuery(
new Term( "test", "Mer"
),
> > >0.6f,
> > > > > > > > >> >> > 1 ); TopDocs result = searcher.search(
q, 10 );
> > > > > > > > >> >> > Assert.assertEquals( >>
>> > 1,
> > > > > > > > >> >> result.totalHits ); ...
> > > > > > > > >> >> >
> > > > > > > > >> >> > > -----Ursprüngliche
Nachricht-----
> > > > > > > > >> >> > > Von: Uwe Schindler
[mailto:uwe@thetaphi.de] >>
> > > > > > > > >> > > Gesendet: Montag, 2. Mai 2011
13:50 >> >> > > An:
> > > > > > > > java-user@lucene.apache.org
> > > > > > > > >> >> > > Betreff: RE: "fuzzy
prefix" search
> > > > > > > > >> >> > >
> > > > > > > > >> >> > > Hi,
> > > > > > > > >> >> > >
> > > > > > > > >> >> > > You can pass an integer
to FuzzyQuery which
>defines
> > the
> > > > > > > > >> >> > > number of characters
that are seen as prefix.
> > > > > > > > >> >> > So all
> > > > > > > > >> >> > > terms must match
> > > > > > > > >> >> > > this prefix and the
rest of each term is matched
>using
> > > > >fuzzy.
> > > > > > > > >> >> > >
> > > > > > > > >> >> > > Uwe
> > > > > > > > >> >> > >
> > > > > > > > >> >> > > -----
> > > > > > > > >> >> > > Uwe Schindler
> > > > > > > > >> >> > > H.-H.-Meier-Allee
63, D-28213 Bremen
> > > > > > > > >> http://www.thetaphi.de
> > > > > > > > >> >> > > eMail: uwe@thetaphi.de
> > > > > > > > >> >> > >
> > > > > > > > >> >> > > > -----Original
Message-----
> > > > > > > > >> >> > > > From: Clemens
Wyss
> > > > > > > [mailto:clemensdev@mysign.ch]
> > > > > > > > >> >> > > > Sent: Monday,
May 02, 2011 1:47 PM >> > > >
>To:
> > > > > > > > >> java-user@lucene.apache.org
> > > > > > > > >> >> > > > Subject: "fuzzy
prefix" search >> >> > > >
> > > > > > > > >> >> > > > I'd like to search
fuzzily but not on a full
>term.
> > > > > > > > >> >> > > > E.g.
> > > > > > > > >> >> > > > I have a text
"Merlot del Ticino"
> > > > > > > > >> >> > > > I'd like
> > > > > > > > >> >> > > > "mer", "merr",
"melo", ... to match.
> > > > > > > > >> >> > > >
> > > > > > > > >> >> > > > If I use FuzzyQuery
only "merlot, "merlott"
>hit.
> > >What
> > > > > > > > >> >> > > > Query-combination
should I use?
> > > > > > > > >> >> > > >
> > > > > > > > >> >> > > > Thx
> > > > > > > > >> >> > > > Clemens
> > > > > > > > >> >> > > >
> > > > > > > > >> >> > > >
> > > > > > > > >> >> > > >
> > > > > > > > >> >> > > >
> > >--------------------------------------------------------
> > > > > > > > >> >> > > > ----
> > > > > > > > >> >> > > > ---
> > > > > > > > >> >> > > > ---
> > > > > > > > >> >> > > > --
> > > > > > > > >> >> > > > - To unsubscribe,
e-mail:
> > > > > > > > >> >> > > > java-user-unsubscribe@lucene.apache.org
> > > > > > > > >> >> > > > For additional
commands, e-mail:
> > > > > > > > >> >> > > > java-user-help@lucene.apache.org
>> >> > >
> > > > > > > > >> >> > >
> > > > > > > > >> >> > >
> > > > > > > > >> >> > >
> > > > > > > > >> >> > >
> > >----------------------------------------------------------
> > > > > > > > >> >> > > ----
> > > > > > > > >> >> > > ---
> > > > > > > > >> >> > > ---
> > > > > > > > >> >> > > - To unsubscribe,
e-mail:
> > > > > > > > >> >> > > java-user-unsubscribe@lucene.apache.org
> > > > > > > > >> >> > > For additional commands,
e-mail:
> > > > > > > > >> >> > > java-user-help@lucene.apache.org
> > > > > > > > >> >> >
> > > > > > > > >> >> >
> > > > > > > > >> >> >
> > > > > > > > >> >>
> > >--------------------------------------------------------------
> > > > > > > > >> >> --
> > > > > > > > >> >> > ---
> > > > > > > > >> >> > -- To unsubscribe, e-mail:
> > > > > > > > >> >> > java-user-unsubscribe@lucene.apache.org
> > > > > > > > >> >> > For additional commands,
e-mail:
> > > > > > > > >> >> > java-user-help@lucene.apache.org
> > > > > > > > >> >>
> > > > > > > > >> >>
> > > > > > > > >> >>
> > > > > > > > >> >>
> > >--------------------------------------------------------------
> > > > > > > > >> >> ----
> > > > > > > > >> >> --- To unsubscribe, e-mail:
> > > > > > > > >> >> java-user-unsubscribe@lucene.apache.org
> > > > > > > > >> >> For additional commands, e-mail:
> > > > > > > > java-user-help@lucene.apache.org >> >
> > > > > > > > >> >
> > > > > > > > >> >
> > > > > > > > >> >
> > >---------------------------------------------------------------
> > > > > > > > >> > ----
> > > > > > > > >> > -- To unsubscribe, e-mail:
> > > > > > > > java-user-unsubscribe@lucene.apache.org
> > > > > > > > >> > For additional commands, e-mail:
> > > > > > > > java-user-help@lucene.apache.org >> >
> > > > > > > > >> >
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>
> > >-----------------------------------------------------------------
> > > > > > > > >> ----
> > > > > > > > >> To unsubscribe, e-mail: java-user-
> > > > unsubscribe@lucene.apache.org
> > > > > > > > >> For additional commands, e-mail:
> > > > > > > > java-user-help@lucene.apache.org >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > >------------------------------------------------------------------
> > > > > > > > > ---
> > > > > > > > > To unsubscribe, e-mail:
> > > > > > > > java-user-unsubscribe@lucene.apache.org
> > > > > > > > > For additional commands, e-mail: java-user-
> > > > help@lucene.apache.org
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > >--------------------------------------------------------------------
> > > > > > > > - To unsubscribe, e-mail:
> > > > java-user-unsubscribe@lucene.apache.org
> > > > > > > > For additional commands, e-mail:
> > > > java-user-help@lucene.apache.org > > >
> > > > > > >
> > > > > > >
> > ---------------------------------------------------------------------
> > > > > > > To unsubscribe, e-mail:
> > > > java-user-unsubscribe@lucene.apache.org
> > > > > > > For additional commands, e-mail:
> > > > java-user-help@lucene.apache.org > > >
> > > > > > >
> > > > > >
> > > > > >
>---------------------------------------------------------------------
> > > > > > To unsubscribe, e-mail:
> > > > java-user-unsubscribe@lucene.apache.org
> > > > > > For additional commands, e-mail:
> > > > java-user-help@lucene.apache.org >
> > > > >
> > > > >
> > > > > ------------------------------------------------------------------
> > > > > ---
> > > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > > For additional commands, e-mail:
> > > > java-user-help@lucene.apache.org >
> > > > >
> > > >
> > > >
> > > > --------------------------------------------------------------------
> > > > - To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
|