lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Furkan KAMACI <>
Subject Re: Custom Tokenizer
Date Thu, 05 Dec 2013 17:24:13 GMT

Standard tokenizer includes of that bydefault:

StandardFilter, LowerCaseFilter and StopFilter

You can consider char filters. Did you read here:


2013/12/5 <>

> Hi,
> I have used StandardAnalyzer in my code and it is working fine. One of the
> challenges that I face is the fact that, this Analyzer by default tokenizes
> on some special characters such as hyphen, apart from the SPACE character.
> I want to tokenize only on the SPACE character. Could you please suggest
> how I can achieve this?
> I got this example when I googled for it. What I want to use is the
> WhitespaceTokenizer so that data is not manipulated in anyway. I understand
> that in this case, searches such as "mechanisms" won't return results
> because of the period (.) at the end. I want to then address this by
> introducing wild-card searches.
> Data: 1097-0215 (i.v) product-123 anti-virus, we investigated the
> mechanisms. 2266-73 In the present study
> Tokens generated with StandardTokenizer:
> [1097-0215] [i.v] [product-123] [anti] [virus] [we] [investigated] [the]
> [mechanisms] [2266-73] [In] [the] [present] [study]
> Tokens generated with WhiteSpaceTokenizer:
> [1097-0215] [(i.v)] [product-123] [anti-virus,] [we] [investigated] [the]
> [mechanisms.] [2266-73] [In] [the] [present] [study]
> Note: I have tried using the WhitespaceAnalyzer which tokenizes by default
> ONLY on the space, but my attempt at performing wildcard searches didn't
> work as expected. Where as, wildcard searches worked fine with
> StandardAnalyzer.
> Please provide your inputs.
> Regards,
> Raghu
> _______________________________________________
> This message is for information purposes only, it is not a recommendation,
> advice, offer or solicitation to buy or sell a product or service nor an
> official confirmation of any transaction. It is directed at persons who are
> professionals and is not intended for retail customer use. Intended for
> recipient only. This message is subject to the terms at:
> For important disclosures, please see:
> regarding market commentary
> from Barclays Sales and/or Trading, who are active market participants; and
> in respect of Barclays Research, including disclosures relating to specific
> issuers, please see
> _______________________________________________

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message