lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Carlson <>
Subject Re: HighLighting Service
Date Wed, 10 Apr 2002 14:57:10 GMT

Hi Lee,

Would you like to add you code to the contributions?



On 4/10/02 1:46 AM, "Lee Mallabone" <> wrote:

> On Tue, 2002-04-09 at 20:22, none none wrote:
>> i am working on the Highlight terms functionality of Lucene.
>> Some problem show up here:
>> 1.It doesn't work with all the Query , e.g.:
>> WidcardQuery,FuzzyQuery,PrefixQuery, PhraseQuery.
> One thing I did was to modify LuceneTools, (well, I rewrote it
> eventually) to output regular expressions instead of just terms. Then
> use gnu.regexp or Jakarta ORO to match expressions against various forms
> of the original documents. This allows you to do custom highlighting
> (ie. highlight entire phrases not just the tokens in those phrases). It
> also allows you to do wildcard matching with better speed if you
> generate a single expression for the wildcard query, rather than
> matching against every single term the wildcard query would match
> individually. I didn't address FuzzyQuery or date queries.
>> What we can do? any suggestion?
> A method of generating document context is to store the body of your
> document in the index. Then retrieve it, normalize any whitespace,
> abbreviate the text at the first hit, and highlight the relevant terms
> in the abbreviated text. This doesn't sound all that quick, but it
> proved to be much quicker than consulting the original document in some
> non-numerical tests I did.
> That works really well for context extracts. However, it may or may not
> be applicable to highlighting the entire document - it would depend on
> the original format of your documents I think. I still consult the
> original (HTML) documents for doing that, but all my documents are
> fairly short.
>> 3.I think we should incorporate this feature in Lucene, right now to make
>> this
>> working you should change some code in the Lucene package, so stay up
>> to date require to change every time these part of code (if they are
>> still there!!).Also because it strictly depend on the Lucene core
>> package.
> There are a whole bunch of different ways of implementing highlighting;
> not all of them require changes to Lucene's core. I think integrating a
> full highlight retrieval system into Lucene that's sufficiently generic
> to match with Lucene's architecture might be difficult at best...
>> I hope someone can help me giving some tips to make me able to complete this
>> functionality.
> I'm not 100% sure what you need to do further?
> For what it's worth, if your current code is sufficient, I'd go with
> that. I've refactored a few highlighting systems, and most of them end
> up with quite a lot of code, depending on how detailed your spec is.
> Regards,

To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message