lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <goks...@gmail.com>
Subject Re: Using hl.regex.pattern to print complete lines
Date Thu, 22 Jul 2010 00:55:11 GMT
Java regex might be different from all other regex, so writing a test
program and experimenting is the only way. Once you decide that this
expression really is what you want, and that it does not achieve what
you expect, you might have found a bug in highlighting.

Lucene/Solr highlighting has always been a difficult area, and might
not do everything right.

On Wed, Jul 21, 2010 at 4:20 PM, Peter Spam <pspam@mac.com> wrote:
> Still not working ... any ideas?
>
>
> -Pete
>
> On Jul 14, 2010, at 11:56 AM, Peter Spam wrote:
>
>> Any other thoughts, Chris?  I've been messing with this a bit, and can't seem to
get (?m)^.*$ to do what I want.
>>
>> 1) I don't care how many characters it returns, I'd like entire lines all the time
>> 2) I just want it to always return 3 lines: the line before, the actual line, and
the line after.
>> 3) This should be like "grep -C1"
>>
>> Thanks for your time!
>>
>>
>> -Pete
>>
>> On Jul 9, 2010, at 12:08 AM, Peter Spam wrote:
>>
>>> Ah, this makes sense.  I've changed my regex to "(?m)^.*$", and it works better,
but I still get fragments before and after some returns.
>>> Thanks for the hint!
>>>
>>>
>>> -Pete
>>>
>>> On Jul 8, 2010, at 6:27 PM, Chris Hostetter wrote:
>>>
>>>>
>>>> : If you can use the latest branch_3x or trunk, hl.fragListBuilder=single
>>>> : is available that is for getting entire field contents with search terms
>>>> : highlighted. To use it, set hl.useFastVectorHighlighter to true.
>>>>
>>>> He doesn't want the entire field -- his stored field values contain
>>>> multi-line strings (using newline characters) and he wants to make
>>>> fragments per "line" (ie: bounded by newline characters, or the start/end
>>>> of the entire field value)
>>>>
>>>> Peter: i haven't looked at the code, but i expect that the problem is that
>>>> the java regex engine isn't being used in a way that makes ^ and $ match
>>>> any line boundary -- they are probably only matching the start/end of the
>>>> field (and . is probably only matching non-newline characters)
>>>>
>>>> java regexes support embedded flags (ie: "(?xyz)your regex") so you might
>>>> try that (i don't remember what the correct modifier flag is for the
>>>> multiline mode off the top of my head)
>>>>
>>>> -Hoss
>>>>
>>>
>>
>
>



-- 
Lance Norskog
goksron@gmail.com

Mime
View raw message