lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Jelsma <>
Subject RE: Re: Solr and Nutch/Droids - to use or not to use?
Date Wed, 16 Jun 2010 19:31:49 GMT
Nutch does not, at this moment, support some form of consistent hashing to select an appropriate
shard. It would be nice if someone could file an issue in Nutch' Jira to add sharding support
to it, perhaps someone with a better understanding and more experience with Solr's distributed
search than i have at the moment. I can't point Nutch' developers to the right piece of documentation
on this one ;)
-----Original message-----
From: Otis Gospodnetic <>
Sent: Wed 16-06-2010 21:03
Subject: Re: Solr and Nutch/Droids - to use or not to use?

Hi Mitch,

Solr can do distributed search, so it can definitely handle indices that can't fit on a single
server without sharding.  What I think *might* be the case that the Nutch indexer that sends
docs to Solr might not be capable of sending documents to multiple Solr cores/shards.  If
that is the case, I think you need to move this to the Nutch user/dev list and see how to
feed multiple Solr indices/cores/shards with Nutch data.

Sematext :: :: Solr - Lucene - Nutch
Lucene ecosystem search ::

----- Original Message ----
> From: MitchK <>
> To:
> Sent: Wed, June 16, 2010 2:27:16 PM
> Subject: Re: Solr and Nutch/Droids - to use or not to use?
Thanks, that really helps to find the right beginning for such a journey. 
> :-)

> * Use Solr, not Nutch's search webapp 
> far as I have read, Solr can't scale, if the index gets too large for 
> one

> The setup explained here has one significant 
> caveat you also need to keep
> in mind: scale. You cannot use this kind of 
> setup with vertical scale
> (collection size) that goes beyond one Solr 
> box. The horizontal scaling
> (query throughput) is still possible with 
> the standard Solr replication
> tools.

Is this still the case?
Furthermore, as far as I 
> have understood this blogpost: 

> href="" target=_blank 
> > 
> : Nutch and Solr , they index the whole stuff with
nutch and reindex it to 
> Solr - sounds like a lot of redundant work.

Lucid, Sematext and the 
> Nutch-wiki are the only information-sources where I
can find talks about 
> Nutch and Solr, but no one seems to talk about these
facts - except this one 
> blogpost.

If you say this is wrong or contingent on the shown setup, can 
> you tell me
how to avoid these problems?

A lot of questions, but it's 
> such an exciting topic...

Hopefully you can answer some of 
> them.

Again, thank you for the feedback, Otis.

- Mitch
View this message in context: 
> href=""

> target=_blank 
> >
> from the Solr - User mailing list archive at

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message