lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pranav Prakash <pra...@gmail.com>
Subject Highlighting uses lots of memory and eventually slows down Solr
Date Fri, 09 Dec 2011 08:41:32 GMT
Hi Group,

I would like to have highlighting for search and I have the fields indexed
with the following schema (Solr 3.4)

<fieldType name="text_commongrams" class="solr.TextField">
 <analyzer>
<charFilter class="solr.HTMLStripCharFilterFactory"/>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
<filter class="solr.TrimFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English"
protected="protwords.txt"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase
="true" expand="true"/>
<filter class="solr.CommonGramsFilterFactory" words="stopwords_en.txt"
ignoreCase="true"/>
<filter class="solr.StopFilterFactory" words="stopwords_en.txt" ignoreCase="
true"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0
"preserveOriginal="1"/>
</analyzer>
</fieldType>

<field name="transcript" type="text_commongrams" indexed="true" stored="true
" termVectors="true" termPositions="true" termOffsets="true"/>

<dynamicField name="*_en" type="text_commongrams" indexed="true" stored="
true" termVectors="true" termPositions="true" termOffsets="true"/>

And the following config

<highlighting>
 <fragmenter name="gap" class="org.apache.solr.highlight.GapFragmenter"
default="true">
 <lst name="defaults">
<int name="hl.fragsize">100</int>
</lst>
</fragmenter>
<fragmenter name="regex" class="org.apache.solr.highlight.RegexFragmenter">
 <lst name="defaults">
<int name="hl.fragsize">20</int>
<float name="hl.regex.slop">0.5</float>
<str name="hl.regex.pattern">[-\w ,/\n\"']{20,200}</str>
</lst>
</fragmenter>
<formatter name="html" class="org.apache.solr.highlight.HtmlFormatter"
default="true">
 <lst name="defaults">
 <str name="hl.simple.pre">
<![CDATA[ <strong> ]]>
</str>
<str name="hl.simple.post">
<![CDATA[ </strong> ]]>
</str>
</lst>
</formatter>
</highlighting>

The problem is that when I turn on highlighting, I face memory issues. The
Memory usage on system goes higher and higher until it consumes all the
memory (I dont receive OOM errors, there is always like 300 MB free
memory). The total memory I have is 48GiB. My Index size is 138GiB and
there are about 10m documents in the index.

I also get the following warning, but I am not sure how to get it done.

WARNING: Deprecated syntax found. <highlighting/> should move to
<searchComponent/>

My Solr log with highlighting turned on looks something like this

[core0] webapp=/solr path=/select
params={mm=3<90%25&qf=title^2&hl.simple.pre=<strong>&hl.fl=title,transcript,transcript_en&wt=ruby&hl=true&rows=12&defType=dismax&fl=id,title,description&debugQuery=false&start=0&q=asdfghjkl&bf=recip(ms(NOW,created_at),1.88e-11,1,1)&hl.simple.post=</strong>&ps=50}

Any help on this would be greatly appreciated. Thanks in advance !!

*Pranav Prakash*

"temet nosce"

Twitter <http://twitter.com/pranavprakash> | Blog <http://blog.myblive.com> |
Google <http://www.google.com/profiles/pranny>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message