lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <>
Subject Re: Does QueryParser uses Analyzer ?
Date Tue, 30 Nov 2004 19:59:27 GMT
On Nov 30, 2004, at 2:29 PM, Ricardo Lopes wrote:
> > My guess is that your analyzer is what did the splitting
> After looker with more attetion to the code i found that the 
> tokenStream method in the BrazilianAnalyzer calls the 
> StandardTokenizer and is this the one that split the search string, is 
> there a simple way of subclass the tokenizer to avoid splitting those 
> characters or do i have make a custom implementation of that class.

You can verify this by using the AnalysisDemo referenced here:

Or use Luke - - which has a nice plugin 
page that can do this type of analysis inspection (you'll have to add 
the sandbox analyzer JAR to the classpath when launching Luke).

As for subclassing StandardTokenizer - no, you won't have much luck 
there.  StandardTokenizer is a JavaCC-based tokenizer and is not 
designed for subclassing to control this sort of thing.

> As this only happends when i make a search (during indexing the 
> splitting of those characters doesn't happend)

Are you sure that splitting is not happening during indexing?  If the 
AnalysisDemo (or Luke) run on your string splits then it is splitting 
at indexing time too.  Keep in mind that looking at a field's value is 
showing you the stored *original* value, not the tokenized values.

>  i thought that i had to do with the QueryParser, but it seems that 
> the problem is with the StandardTokenizer.

I'm not sure - I haven't tried that string with the analyzer you 
provided.  If it was with StandardTokenizer and you're using the same 
analyzer for indexing and searching, you'd have the values split in 
both places - which is actually fine as searches would match what was 
indexed :)


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message