lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mike Klaas" <mike.kl...@gmail.com>
Subject Re: Snippet Generation at Punctuation Marks?
Date Thu, 03 May 2007 17:49:37 GMT
On 5/3/07, Brian Whitman <brian.whitman@variogr.am> wrote:
> On May 3, 2007, at 11:39 AM, Jack L wrote:
> > Snippet generation use hl.fragsize to determine the size
> > of the snippets. This works very well. However, the snippets
> > often have half of a sentence at the beginning, and half
> > at the end. Is there a parameter I can use to tell the
> > snippet generation code to cut at punctuation marks when
> > possible?
>
>
> We are working on this and hope to have a solr patch soon. Doing
> simple splitting on punctuation is a new fragmenter, which trunk solr
> does not support yet. But we're hoping to fix that asap.

See http://issues.apache.org/jira/browse/SOLR-102 for my solution to
this problem.  The idea is that you'd like to split at sentence
boundaries, but also not stray too far from the desired fragment size.
 It would be great to get comments on/improvements to this approach.

-Mike

Mime
View raw message