lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pranav Prakash <pra...@gmail.com>
Subject Re: Highlighting uses lots of memory and eventually slows down Solr
Date Mon, 19 Dec 2011 08:51:01 GMT
No respinse !! Bumping it up

*Pranav Prakash*

"temet nosce"

Twitter <http://twitter.com/pranavprakash> | Blog <http://blog.myblive.com> |
Google <http://www.google.com/profiles/pranny>


On Fri, Dec 9, 2011 at 14:11, Pranav Prakash <pranny@gmail.com> wrote:

> Hi Group,
>
> I would like to have highlighting for search and I have the fields indexed
> with the following schema (Solr 3.4)
>
> <fieldType name="text_commongrams" class="solr.TextField">
>  <analyzer>
> <charFilter class="solr.HTMLStripCharFilterFactory"/>
> <tokenizer class="solr.StandardTokenizerFactory"/>
> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
> <filter class="solr.TrimFilterFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.SnowballPorterFilterFactory" language="English"
> protected="protwords.txt"/>
> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
> <filter class="solr.CommonGramsFilterFactory" words="stopwords_en.txt"
> ignoreCase="true"/>
> <filter class="solr.StopFilterFactory" words="stopwords_en.txt" ignoreCase
> ="true"/>
> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll
> ="0"preserveOriginal="1"/>
> </analyzer>
> </fieldType>
>
> <field name="transcript" type="text_commongrams" indexed="true" stored="
> true" termVectors="true" termPositions="true" termOffsets="true"/>
>
> <dynamicField name="*_en" type="text_commongrams" indexed="true" stored="
> true" termVectors="true" termPositions="true" termOffsets="true"/>
>
> And the following config
>
> <highlighting>
>  <fragmenter name="gap" class="org.apache.solr.highlight.GapFragmenter"
> default="true">
>  <lst name="defaults">
> <int name="hl.fragsize">100</int>
> </lst>
> </fragmenter>
> <fragmenter name="regex" class="org.apache.solr.highlight.RegexFragmenter"
> >
>  <lst name="defaults">
> <int name="hl.fragsize">20</int>
> <float name="hl.regex.slop">0.5</float>
> <str name="hl.regex.pattern">[-\w ,/\n\"']{20,200}</str>
> </lst>
> </fragmenter>
> <formatter name="html" class="org.apache.solr.highlight.HtmlFormatter"
> default="true">
>  <lst name="defaults">
>  <str name="hl.simple.pre">
> <![CDATA[ <strong> ]]>
> </str>
> <str name="hl.simple.post">
> <![CDATA[ </strong> ]]>
> </str>
> </lst>
> </formatter>
> </highlighting>
>
> The problem is that when I turn on highlighting, I face memory issues. The
> Memory usage on system goes higher and higher until it consumes all the
> memory (I dont receive OOM errors, there is always like 300 MB free
> memory). The total memory I have is 48GiB. My Index size is 138GiB and
> there are about 10m documents in the index.
>
> I also get the following warning, but I am not sure how to get it done.
>
> WARNING: Deprecated syntax found. <highlighting/> should move to
> <searchComponent/>
>
> My Solr log with highlighting turned on looks something like this
>
>  [core0] webapp=/solr path=/select
> params={mm=3<90%25&qf=title^2&hl.simple.pre=<strong>&hl.fl=title,transcript,transcript_en&wt=ruby&hl=true&rows=12&defType=dismax&fl=id,title,description&debugQuery=false&start=0&q=asdfghjkl&bf=recip(ms(NOW,created_at),1.88e-11,1,1)&hl.simple.post=</strong>&ps=50}
>
> Any help on this would be greatly appreciated. Thanks in advance !!
>
> *Pranav Prakash*
>
> "temet nosce"
>
> Twitter <http://twitter.com/pranavprakash> | Blog<http://blog.myblive.com>
|
> Google <http://www.google.com/profiles/pranny>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message