lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From crspan <>
Subject Re: index U.K. U.S. U.N. U.V.
Date Tue, 17 Jul 2007 03:16:37 GMT
Are we sure about KeywordAnalyzer here? Which suppose to  "Tokenizes" 
the entire stream as a single token. (useful for data like zip codes, 
ids, and some product names.)

In the scenario we are discussing,  U.S. is  just a  token within the 
text and we still would like to leverage from StandardAnalyzer for all 
other goodies. I am sorry for the incomplete set up in previous message.

More or less, I expect somewhere we can instruct StandardTokenizer.jj 
that U.S. is a special token (even it is indeed an ACRONYM) and we 
prefer to index it as U.S. as is. Can we do that?


Otis Gospodnetic wrote:
> Use KeywordAnalyzer to leave "U.S." as-is and index it as-is.
> Otis
> --
> Lucene Consulting --
> ----- Original Message ----
> From: crspan <>
> To:
> Sent: Saturday, July 14, 2007 5:18:59 PM
> Subject: index U.K. U.S. U.N. U.V.
> Would you please advice the best practice of indexing:
>   U.S.
> The standard analyzer will transform it to be "us", which collide with 
> "us"(we).
> Thanks,
> Charlie

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message