lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: SolrCloud Performance - Indexing
Date Tue, 27 Nov 2012 18:56:14 GMT
Yup, DIH is not optimal for SolrCloud yet. I made a few JIRA issues a short while ago that
may help.

I've seen people use it with SolrCloud in the past though - and it wasn't so slow…(though
I'm sure slower than a single node).

Search me...

- Mark

On Nov 27, 2012, at 1:24 PM, Mikhail Khludnev <mkhludnev@griddynamics.com> wrote:

> It sounds like DataImportHandler will not be really performant with
> SolrCloud. From what I see it should essentiallly work - it sends doc to
> the chain, which should distribute them via DistributedUpdateProcessor. But
> it works synchronously - no multithreading in DIH since 4.0!
> Does anyone has an experience or idea of fast data acquisition with
> DIH&SolrCloud?
> Excuse me for thread hijacking.
> 
> 
> On Tue, Nov 27, 2012 at 8:10 PM, Mark Miller <markrmiller@gmail.com> wrote:
> 
>> To get the best speed out of SolrCloud you have to index from many clients
>> (or threads). Even better is if you index to many nodes rather than one.
>> 
>> Using a single thread against a single instance with replicas will be a
>> fair amount slower with cloud than if you just used one node.
>> 
>> - Mark
>> 
>> On Nov 27, 2012, at 12:02 AM, deniz <denizdurmus87@gmail.com> wrote:
>> 
>>> As I am some kinda confused, I wanna check if anyone else has same
>> confusions
>>> like mine about solrcloud..
>>> 
>>> I have set up an environment with 3 solr instances and 2 zookeepers, amd
>>> tried to index some documents from mysql db. the total amount the docs
>> are
>>> around 3.5M. before indexing i was expecting some longer time for cloud
>> as
>>> it does replication between nodes, but i am some kinda disappointed after
>>> seeing that indexing took 4 to 5 times higher than indexing on a single
>> solr
>>> instance. on a single solr instance i am able to index those docs around
>> 17
>>> mins while with cloud it tooks around 60 minutes. and as a possible
>>> production environment will have more instances and machines available
>> for
>>> the cloud, i cant imagine the indexing time... in adiditon to initial
>>> indexing time, we will be updating our indexes frequently, which makes me
>>> sceptical about solrcloud.
>>> 
>>> so in a possible production environment with solrcloud, in case there is
>> a
>>> serious failure on some nodes, sync operation on cloud will take long
>>> time... in this case, reindexing everything on a single instance will
>> took
>>> less than 17 mins, which is a reasonable amount of time for a crash.. so
>> in
>>> this case does it make sense use solrcloud although indexing time will
>>> increase much higher than a single instance? or using a traditional
>> master -
>>> slave structure will be better for this case?
>>> 
>>> I am aware cloud makes loadbalancing and some other stuff largely
>> concerned
>>> about searching, rather than indexing, but for a frequently updated
>> system,
>>> does it still useful to set up a cloud environment?
>>> 
>>> and are there some workarounds for indexing speed, other than the known
>> ones
>>> for solr, on cloud?
>>> 
>>> 
>>> 
>>> -----
>>> Zeki ama calismiyor... Calissa yapar...
>>> --
>>> View this message in context:
>> http://lucene.472066.n3.nabble.com/SolrCloud-Performance-Indexing-tp4022549.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>> 
>> 
> 
> 
> -- 
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
> 
> <http://www.griddynamics.com>
> <mkhludnev@griddynamics.com>


Mime
View raw message