lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <>
Subject Re: Announcement: Boilerplate removal library
Date Mon, 14 Dec 2009 15:15:52 GMT
Storing the original would be an excellent idea and would be quite doable.

2009/12/14 Christian Kohlschütter <>

> However it would also be great (in order to increase recall) to also store
> non-content and just add some kind of static boosting for content blocks
> over non-content blocks. I am not sure whether this will work right now
> using an Analyzer. What you could do though, is to store the text into
> separate fields ("content"/"boilerplate") and add field-specific boosts at
> query time.

Ted Dunning, CTO

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message