lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Spam <ps...@mac.com>
Subject Solr searching performance issues, using large documents
Date Wed, 21 Jul 2010 00:36:47 GMT
Data set: About 4,000 log files (will eventually grow to millions).  Average log file is 850k.
 Largest log file (so far) is about 70MB.

Problem: When I search for common terms, the query time goes from under 2-3 seconds to about
60 seconds.  TermVectors etc are enabled.  When I disable highlighting, performance improves
a lot, but is still slow for some queries (7 seconds).  Thanks in advance for any ideas!


-Peter


-------------------------------------------------------------------------------------------------------------------------------------

4GB RAM server
% java -Xms2048M -Xmx3072M -jar start.jar

-------------------------------------------------------------------------------------------------------------------------------------

schema.xml changes:

    <fieldType name="text_pl" class="solr.TextField">
      <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
 	<filter class="solr.LowerCaseFilterFactory"/> 
	<filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="0"
catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>
      </analyzer>
    </fieldType>

...

   <field name="body" type="text_pl" indexed="true" stored="true" multiValued="false" termVectors="true"
termPositions="true" termOffsets="true" />
    <field name="timestamp" type="date" indexed="true" stored="true" default="NOW" multiValued="false"/>
   <field name="version" type="string" indexed="true" stored="true" multiValued="false"/>
   <field name="device" type="string" indexed="true" stored="true" multiValued="false"/>
   <field name="filename" type="string" indexed="true" stored="true" multiValued="false"/>
   <field name="filesize" type="long" indexed="true" stored="true" multiValued="false"/>
   <field name="pversion" type="int" indexed="true" stored="true" multiValued="false"/>
   <field name="first2md5" type="string" indexed="false" stored="true" multiValued="false"/>
   <field name="ckey" type="string" indexed="true" stored="true" multiValued="false"/>

...

 <dynamicField name="*" type="ignored" multiValued="true" />
 <defaultSearchField>body</defaultSearchField>
 <solrQueryParser defaultOperator="AND"/>

-------------------------------------------------------------------------------------------------------------------------------------

solrconfig.xml changes:

    <maxFieldLength>2147483647</maxFieldLength>
    <ramBufferSizeMB>128</ramBufferSizeMB>

-------------------------------------------------------------------------------------------------------------------------------------

The query:

rowStr = "&rows=10"
facet = "&facet=true&facet.limit=10&facet.field=device&facet.field=ckey&facet.field=version"
fields = "&fl=id,score,filename,version,device,first2md5,filesize,ckey"
termvectors = "&tv=true&qt=tvrh&tv.all=true"
hl = "&hl=true&hl.fl=body&hl.snippets=1&hl.fragsize=400"
regexv = "(?m)^.*\n.*\n.*$"
hl_regex = "&hl.regex.pattern=" + CGI::escape(regexv) + "&hl.regex.slop=1&hl.fragmenter=regex&hl.regex.maxAnalyzedChars=2147483647&hl.maxAnalyzedChars=2147483647"
justq = '&q=' + CGI::escape('body:' + fuzzy + p['q'].to_s.gsub(/\\/, '').gsub(/([:~!<>="])/,'\\\\\1')
+ fuzzy + minLogSizeStr)

thequery = '/solr/select?timeAllowed=5000&wt=ruby' + (p['fq'].empty? ? '' : ('&fq='+p['fq'].to_s)
) + justq + rowStr + facet + fields + termvectors + hl + hl_regex

baseurl = '/cgi-bin/search.rb?q=' + CGI::escape(p['q'].to_s) + '&rows=' + p['rows'].to_s
+ '&minLogSize=' + p['minLogSize'].to_s


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message