lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lord Khan Han <khanuniver...@gmail.com>
Subject Re: SOLR Index Speed
Date Fri, 30 Sep 2011 12:40:20 GMT
Any idea ?

On Thu, Sep 29, 2011 at 1:53 PM, Lord Khan Han <khanuniverse1@gmail.com>wrote:

> Hi,
>
> The no-op run completed in 20 minutes. The only commented line was
> "solr.addBean(doc)" We've tried SUSS as a drop in replacement for
> CommonsHttpSolrServer but it's behavior was weird. We have seen 10Ks of
> seconds for updates and it continues for a very long time after sending to
> solr is complete. We thought that it was because we are indexing POJOS as
> documents. BTW, SOLR-1565 and SOLR-2755 says that SUSS does not support
> binary payload.
>
>
> CommonsHttpSolrServer solr = new CommonsHttpSolrServer(url);
>
> solr.setRequestWriter(new BinaryRequestWriter());
>
> ...
>
> // doc is a solrj annotated POJO
>
> solr.addBean(doc)
>
>
> Any thoughts what may be taking too long? Before mapreduce we were indexing
> in 2-3 hours to localhost using the same code base.
>
> On Tue, Sep 27, 2011 at 8:55 PM, Otis Gospodnetic <
> otis_gospodnetic@yahoo.com> wrote:
>
>> Hello,
>>
>> By the way, should you need help with Hadoop+Solr, please feel free to get
>> in touch with us at Sematext (see below) - we happen to work with Hadoop and
>> Solr on a daily basis and have successfully implemented parallel indexing
>> into Solr with/from Hadoop.
>>
>> Otis
>> ----
>> Sematext :: http://sematext.com/ :: Solr - Lucene - Hadoop - HBase
>> Lucene ecosystem search :: http://search-lucene.com/
>>
>>
>> ------------------------------
>> *From:* Otis Gospodnetic <otis_gospodnetic@yahoo.com>
>> *To:* "solr-user@lucene.apache.org" <solr-user@lucene.apache.org>
>> *Sent:* Tuesday, September 27, 2011 1:37 PM
>>
>> *Subject:* Re: SOLR Index Speed
>>
>> Hi,
>>
>> No need to use reply-all and CC me directly, I'm on the list :)
>>
>> It sounds like Solr is not the problem, but the Hadoop side.  For example,
>> what if you change your reducer not to call Solr but do some no-op.  Does it
>> go beyond 500-700 docs/minute?
>>
>> Otis
>> ----
>> Sematext :: http://sematext.com/ :: Solr - Lucene - Hadoop - HBase
>> Lucene ecosystem search :: http://search-lucene.com/
>>
>>
>>
>> >________________________________
>> >From: Lord Khan Han <khanuniverse1@gmail.com>
>> >To: solr-user@lucene.apache.org; Otis Gospodnetic <
>> otis_gospodnetic@yahoo.com>
>> >Sent: Tuesday, September 27, 2011 4:42 AM
>> >Subject: Re: SOLR Index Speed
>> >
>> >Our producer (hadoop  mapper prepare the docs for submitting and the
>> reducer
>> >diriectly submit from solrj  http submit..) now 32 reducer but still the
>> >indexing speed 500 - 700 doc per minute.  submission coming from a hadoop
>> >cluster so submit speed is not a problem.  I couldnt use the full solr
>> index
>> >machine resources.
>> >
>> >I gave 12 gig heap to solr and machine is not swapping.
>> >
>> >I couldnt figure out the problem if there is..
>> >
>> >PS: We are committing at the end of the submit.
>> >
>> >
>> >On Tue, Sep 27, 2011 at 11:37 AM, Lord Khan Han <khanuniverse1@gmail.com
>> >wrote:
>> >
>> >> Sorry :)  it is not 500 doc per sec.  ( It is what i wish I think)  It
>> is
>> >> 500 doc per MINUTE..
>> >>
>> >>
>> >>
>> >> On Tue, Sep 27, 2011 at 7:14 AM, Otis Gospodnetic <
>> >> otis_gospodnetic@yahoo.com> wrote:
>> >>
>> >>> Hello,
>> >>>
>> >>> > PS: solr streamindex  is not option because we need to submit
>> javabin...
>> >>>
>> >>>
>> >>> If you are referring to StreamingUpdateSolrServer, then the above
>> >>> statement makes no sense and you should give SUSS a try.
>> >>>
>> >>> Are you sure your 16 reducers produce more than 500 docs/second?
>> >>> I think somebody already suggested increasing the number of reducers
>> to
>> >>> ~32.
>> >>> What happens to your CPU load and indexing speed then?
>> >>>
>> >>>
>> >>> Otis
>> >>> ----
>> >>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>> >>> Lucene ecosystem search :: http://search-lucene.com/
>> >>>
>> >>>
>> >>> >________________________________
>> >>> >From: Lord Khan Han <khanuniverse1@gmail.com>
>> >>> >To: solr-user@lucene.apache.org
>> >>> >Sent: Monday, September 26, 2011 7:09 AM
>> >>> >Subject: SOLR Index Speed
>> >>> >
>> >>> >Hi,
>> >>> >
>> >>> >We have 500K web document and usind solr (trunk) to index it. We
have
>> >>> >special anaylizer which little bit heavy cpu .
>> >>> >Our machine config:
>> >>> >
>> >>> >32 x cpu
>> >>> >32 gig ram
>> >>> >SAS HD
>> >>> >
>> >>> >We are sending document with 16 reduce client (from hadoop) to the
>> stand
>> >>> >alone solr server. the problem is we couldnt get speedier than the
>> 500
>> >>> doc /
>> >>> >per sec. 500K document tooks 7-8 hours to index :(
>> >>> >
>> >>> >While indexin the the solr server cpu load is around : 5-6  (32
max)
>> it
>> >>> >means  %20 of the cpu total power. We have plenty ram ...
>> >>> >
>> >>> >I turned of auto commit  and give 8198 rambuffer .. there is no
io
>> wait
>> >>> ..
>> >>> >
>> >>> >How can I make it faster ?
>> >>> >
>> >>> >PS: solr streamindex  is not option because we need to submit
>> javabin...
>> >>> >
>> >>> >thanks..
>> >>> >
>> >>> >
>> >>> >
>> >>>
>> >>
>> >>
>> >
>> >
>> >
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message