lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Bamford <>
Subject SnowballAnalyzer question
Date Fri, 08 Aug 2008 12:07:25 GMT

I am using the SnowballAnalyzer because of it's multi-language stemming 
capabilities - and am very happy with that.
There is one small glitch which I'm hoping to overcome - can I get it to 
split up internet domain names in the same way that StopAnalyzer does?
i.e.  for the sentence "This is a URL: / this is a company 
name: XY&Z Corporation", here is the default output from the two analysers:

    [url] [www] [google] [de] [company] [name] [xy] [z] [corporation]

    [this] [is] [a] [url] [] [this] [is] [a] [compani] 
[name] [xy&z] [corpor]

Ideally I would like "" to be split into [www] [google] 
[de] (rather than []), but retain the rest of the  
SnowballAnalyzer's capabilities.
Can I perhaps extend  SnowballAnalyzer to allow me to achieve this?

Thanks for any tips / pointers,

- Chris

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message