lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Burton-West, Tom" <tburt...@umich.edu>
Subject RE: Improve Query Time For Large Index
Date Tue, 10 Aug 2010 16:28:03 GMT
Hi Peter,

A few more details about your setup would help list members to answer your questions.
How large is your index?  
How much memory is on the machine and how much is allocated to the JVM?
Besides the Solr caches, Solr and Lucene depend on the operating system's disk caching for
caching of postings lists.  So you need to leave some memory for the OS.  On the other hand
if you are optimizing and refreshing every 10-15 minutes, that will invalidate all the caches,
since an optimized index is essentially a set of new files.

Can you give us some examples of the slow queries?  Are you using stop words?  

If your slow queries are phrase queries, then you might try either adding the most frequent
terms in your index to the stopwords list  or try CommonGrams and add them to the common words
list.  (Details on CommonGrams here: http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-2)

Tom Burton-West

-----Original Message-----
From: Peter Karich [mailto:peathal@yahoo.de] 
Sent: Tuesday, August 10, 2010 9:54 AM
To: solr-user@lucene.apache.org
Subject: Improve Query Time For Large Index

Hi,

I have 5 Million small documents/tweets (=> ~3GB) and the slave index
replicates itself from master every 10-15 minutes, so the index is
optimized before querying. We are using solr 1.4.1 (patched with
SOLR-1624) via SolrJ.

Now the search speed is slow >2s for common terms which hits more than 2
mio docs and acceptable for others: <0.5s. For those numbers I don't use
highlighting or facets. I am using the following schema [1] and from
luke handler I know that numTerms =~20 mio. The query for common terms
stays slow if I retry again and again (no cache improvements).

How can I improve the query time for the common terms without using
Distributed Search [2] ?

Regards,
Peter.


[1]
<field name="id" type="tlong" indexed="true" stored="true"
required="true" />
<field name="date" type="tdate" indexed="true" stored="true" />
<!-- term* attributes to prepare faster highlighting. -->
<field name="txt" type="text" indexed="true" stored="true"
               termVectors="true" termPositions="true" termOffsets="true"/>

[2]
http://wiki.apache.org/solr/DistributedSearch


Mime
View raw message