lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: Solr Capabilities/Limitations
Date Wed, 02 Jul 2008 17:10:30 GMT
Willie,

Yes, Solr has "multi core" support: <http://wiki.apache.org/solr/MultiCore 
 >

	Erik



On Jul 2, 2008, at 1:15 PM, Willie Wong wrote:

> Thanks Mike for your quick response - they were very informative and
> useful.
>
> I have one final question if you don't mind....  is it possible for a
> single Solr instance to switch between multiple indexes?  For  
> example, can
> solr search in one index on one server partition then use another  
> index
> located on another drive, without requiring a restart?  This differs
> slightly from the distributed search examples I've read in the
> documentation where you have another server running solr with the
> distributed index.
>
>
> Thanks,
>
> Willie
>
>
>
>
>
> Mike Klaas <mike.klaas@gmail.com>
> 01/07/2008 05:44 PM
> Please respond to
> solr-user@lucene.apache.org
>
>
> To
> solr-user@lucene.apache.org
> cc
>
> Subject
> Re: Solr Capabilities/Limitations
>
>
>
>
>
>
> On 1-Jul-08, at 8:37 AM, Willie Wong wrote:
>
>> I need to be able to search through terabytes of existing data.
>> Documents
>> may vary in size from 10 MB to 20 KB in size.  Also at some point I? 
>> ll
>> also need to feed in approximately approximately 1-5 million new
>> documents
>> a day.
>
> This depends greatly on what kind of searching you want to do, and
> what are the desired response times.  I'm using Solr to full-text
> search about 10 TB of data at the moment.  Response times are around
> ~1s including dynamic snippet generation.  The queries themselves are
> relatively complicated by lucene standards, including a custom word-
> proximity boosting query and link-analysis factors.
>
> Of course, this is distributed over dozens of machines, and is a
> mostly static index.  There are about 10million docs per server.
>
>> Has anyone used Solr to conduct searches over terabytes of data?  If
>> so,
>> are there any configuration parameters I should pay particular
>> attention
>> to such jvm size, mergeFactor etc?
>
> JVM size will depend mostly on your sorting/faceting requirements.
> Just remember to leave gobs of memory for the OS disk cache.  Memory
> is key to serving large indices (consequently, things won't be fast
> until a decent amount of warming up is done).  mergeFactor?  You
> should only be searching optimized indices of this size, so it isn't
> terribly relevant.  The daily new docs should probably be added in
> their own index, which is then searched in parallel with the existing
> indices.
>
>> Is there a limit to the number of shards Solr is capable of?  I don?t
>> think there?s any way I can do this without some sort of distributed
>> search.
>
> Not really, though you will want to move to a 2-level hierarchy
> eventually.  I can't speak for the distributed search implementation
> in trunk (we built our own before this was available), but it should
> be exactly what you need.
>
>> I?ve read that solr indexes can go into the millions if not billions
>> of
>> documents? however at what point do the index size become
>> impractical ? I
>> know this is a bit open ended, but I guess does Solr have a limit to
>> the
>> number of documents that can be in a single index?
>
> Depends on query composition and document size.  But for web docs,
> about 10m seems practical.
>
>> Has anyone looked into any of these other search engines and are
>> there any
>> other search engines that would be better suited such as Fast or
>> Automomy:
>> http://mg4j.dsi.unimi.it/
>> http://www.egothor.org/performance.shtml
>
>
> I haven't, but it should be possible to build a system based on those
> engines.  For a system this size, the distributed architecture will be
> more important than the underlying index engine (though it sure helps
> to use an engine as optimized as lucene).
>
> -Mike
>


Mime
View raw message