lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sendros, Jason" <>
Subject RE: StandardAnalyzer splits word. EX: key:abc_xyz converts into key:abc xyz
Date Mon, 22 Aug 2011 13:30:33 GMT
Hi Srinu,

The StandardAnalyzer considers the underscore to be a word separator
which is why you are seeing this behavior. Your other scenario where you
have a number following the underscore is a situation where the
StandardAnalyzer decides that even though there is an underscore, the
entire string should be kept together as one token due to the number,
which changes what that string could be (e.g. "company name", "word",
"abbreviation", etc).

Check this discussion for an understanding of the underscore being a
word separator:

And here you can find the StandardAnalyzer class for the most recent
version of Lucene:

Hopefully reading through those links helps you understand what's
happening within Lucene. To solve this, try using a different analyzer
that suits your needs or perhaps modifying the StandardAnalyzer to
follow the rules you prefer.


-----Original Message-----
From: srinu.hello [] 
Sent: Saturday, August 20, 2011 10:11 AM
Subject: StandardAnalyzer splits word. EX: key:abc_xyz converts into
key:abc xyz

Hello All, 
           I observed  some unexpected behavior using StandardAnalyzer
parse the query. Here is the demonstration.

I am passing the query as (key:xyz_abc) && (text:blabla)

Expecting the parsed query to be +key:xyz_abc +text:blabla

Actual Result is +key:"xyz abc" +text:blabla

As per my understanding StandardAnalyzer splits the word boundaries into
multiple words but the above word xyz_abc is a single word. Please
me if i am wrong.

I also observed if number is there after underscore the parsed query is
expected. i.e

If i give the query as (key:xyz_1abc) && (text:blabla) the parsed query
+key:xyz_1abc +text:blabla

This is the behavior i am expecting. 

Please help.


View this message in context:
Sent from the Lucene - General mailing list archive at

View raw message