lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven A Rowe <>
Subject RE: Match span of capitalized words
Date Fri, 05 Feb 2010 22:23:12 GMT
Hi Max,

On 02/05/2010 at 10:18 AM, Grant Ingersoll wrote:
> On Feb 3, 2010, at 8:57 PM, Max Lynch wrote:
> > Hi, I would like to do a search for "Microsoft Windows" as a span, but
> > not match if words before or after "Microsoft Windows" are upper cased.
> > 
> > For example, I want this to match: another crash for Microsoft Windows
> > today But not this: another crash for Microsoft Windows Server today
> > 
> > Is this possible?  My first attempt started with the SpanRegexQuery
> > from the regex contrib package, but I can't figure out how to put in a
> > term I do want to match but don't want to include in the final
> > highlighting match. Does that make sense?
> > 
> > My example (using WhitespaceAnalyzer since I care about case):
> > 
> > SpanRegexQuery srq1 = new SpanRegexQuery(new Term("contents", "Chase"));
> > SpanRegexQuery srq2 = new SpanRegexQuery(new Term("contents", "Bank[\\.]*"));
> > SpanRegexQuery srq3 = new SpanRegexQuery(new Term("contents", "[^A-Z]*"));
> I'm not sure it supports it, but I wonder if you could use a negative
> lookahead assertion?  Most regex languages support it.

I don't think this would work, since the input to a SpanRegexQuery regex is a single Term;
following Terms are not included in the input.

I *think* you can get what you want using SpanNotQuery - something like the following, using
your "Microsoft Windows" example:

        SpanNear(in-order=true, slop=0):
            SpanTerm: "Microsoft"
            SpanTerm: "Windows"
        SpanNear(in-order=true, slop=0):
            SpanTerm: "Microsoft"
            SpanTerm: "Windows"
            SpanRegex: "^\\p{Lu}.*"


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message