lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peyman Faratin <pey...@robustlinks.com>
Subject StandardTokenizer
Date Thu, 29 Sep 2011 18:51:34 GMT
Hi

I have a sentence

"i'll email you at x@abc.com"

and I am looking at the tokens a StandardAnalyzer (which uses the StandardTokenizer) produces

1: [i'll:0->4:<ALPHANUM>] 
2: [email:5->10:<ALPHANUM>] 
3: [you:11->14:<ALPHANUM>] 
5: [x:18->19:<ALPHANUM>] 
6: [abc.com:20->27:<ALPHANUM>] 

I am using the following constructor

    new StandardAnalyzer(Version.LUCENE_32),

My question is:

1- shouldn't we be seeing a token x@abc.com (since that is the grammar of StandardAnalyzer?,
and

2- shouldn't the token type be "email" for abc.com and "apostrophe" for "i'll"?

thank you

Peyman
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message