lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dennis Gearon <gear...@sbcglobal.net>
Subject Re:Re: How to speed up solr search speed
Date Fri, 16 Jul 2010 19:20:26 GMT
Isn't it always one of these four? (from most likely to least likely, generally)

Memory (as a ceiling limit)
Disk Speed
WebServer and it's code
CPU.

Memory and Disk are related, as swapping occurs between them. As long as memory is high enough,
it becomes:

Disk Speed
WebServer and it's code
CPU

If the WebServer is configured to be as fast as is possible,only THEN the CPU comes into play.

So normally:

1/ Put enough memory in so it doesn't swap
2/ Buy the fastest damn disk/diskArrays/SolidState/HyperDrive RamDisk/RAIDed HyperDrive RamDisk
that you can afford.
3/ Tune your webserver code.

1 moderate *LAPTOP* with 8-16 gig of ram(with a 64bit OS :-), and an single, external SATA
HyperDrive 64Gig RamDrive is SCREAMING, way beyond most single server boxes you'll pay to
get hosting on. The processor almost doesn't matter. Imagine what it's like with an array
of those things.

Shows how much Ram and Disk slow things down.

Get going that fast and it's the Ethernet connection that slows things down. Good gaming boards
are now coming with dual ethernet IO stock with software preconfigured to issue requests via
both and get delivered to both. One dies and it falls back to the other.

Dennis Gearon

Signature Warning
----------------
EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Fri, 7/16/10, marship <marship@126.com> wrote:

> From: marship <marship@126.com>
> Subject: Re:Re: How to speed up solr search speed
> To: solr-user@lucene.apache.org
> Date: Friday, July 16, 2010, 11:26 AM
> Hi. Peter. 
> 
>  Thanks for replying.
> 
> 
> >Hi Scott!
> >
> >> I am aware these cores on same server are
> interfering with each other.
> >
> >Thats not good. Try to use only one core per CPU. With
> more per CPU you
> >won't have any benefits over the single-core version, I
> think.
> 
>  I only have 2 servers, each CPU with 8 cores. Each server
> has 6G memory. So I have 16 CPU core in total. But I have 70
> solr cores so I have to use them on my 2 servers. Based on
> my observation, even when the search is processing, the CPU
> usage is not high. The memory usage is not high too. Each
> solr(jetty) instance on consume 40M-60M memory. My server
> always have 2-3G memory availabe.
> >
> >> can solr use more memory to avoid disk operation
> conflicts?
> >
> >Yes, only the memory you have on the machine of course.
> Are you using
> >tomcat or jetty?
> >
> 
> I am using jetty.
> >> For my case, I don't think solr can work as fast
> as 100-200ms on average.
> >
> >We have indices with a lot entries not as large as
> yours, but in the
> >range of X Million. and have response times under
> 100ms.
> >What about testing only one core with 5-10 Mio docs? If
> the response
> >time isn't any better maybe you need a different field
> config or sth.
> >different is wrong?
> 
> For the moment, I really don't know. I tried to use java
> -sever -jar start.jar to start jetty/solr. I saw when solr
> start, sometimes some core search for simple keyword like
> "design" will take 70s, of course some only take 0-15ms.
> From my aspect, I do believe it is the harddisk accessed by
> these cores deplays each other. So finally some cores fall
> behind. But the bad news for me is the solr distriubted
> search's speed is decided by the slowest one. 
> 
> 
> >
> >> So should I add it or the default(without it ) is
> ok?
> >
> >Without is also okay -> solr uses default.
> >With 75 Mio docs it should around 20 000 but I guess
> there is sth.
> >different wrong: maybe caching or field definition.
> Could you post the
> >latter one?
> >
> 
> Sorry. What are you asking me to post?
> 
>  
> 
> 
> >Regards,
> >Peter.
> >
> >> Hi. Peter.
> >> I think I am not using faceting, highlighting ...
> I read about them
> >> but don't know how to work with them. I am using
> the default "example"
> >> just change the indexed fields.
> >> For my case, I don't think solr can work as fast
> as 100-200ms on
> >> average. I tried some keywords on only single solr
> instance. It
> >> sometimes takes more than 20s. I just input 4
> keywords. I agree it is
> >> keyword concerns. But the issue is it doesn't work
> consistently.
> >>
> >> When 37 instances on same server works at same
> time (when a
> >> distributed search start), it goes worse, I saw
> some solr cores
> >> execute very fast, 0ms, ~40ms, ~200ms. But more
> solr cores executed as
> >> ~2500ms, ~3500ms, ~6700ms. and about 5-10 solr
> cores need more than
> >> 17s. I have 70 cores running. And the search speed
> depends on the
> >> SLOWEST one. Even 69 cores can run at 1ms. but
> last one need 50s. then
> >> the distributed search speed is 50s.
> >> I am aware these cores on same server are
> interfering with each other.
> >> As I have lots of free memory. I want to know,
> with the prerequisite,
> >> can solr use more memory to avoid disk operation
> conflicts?
> >>
> >> Thanks.
> >> Regards.
> >> Scott
> >>
> >> 在2010-07-15 17:19:57,"Peter Karich" <peathal@yahoo.de>
> 写道:
> >>> How does your queries look like? Do you use
> faceting, highlighting, ... ?
> >>> Did you try to customize the cache?
> >>> Setting the HashDocSet to "0.005 of all
> documents" improves our
> >>> search speed a lot.
> >>> Did you optimize the index?
> >>>
> >>> 500ms seems to be slow for an 'average'
> search. I am not an expert
> >>> but without highlighting it should be faster
> as 100ms or at least 200ms
> >>>
> >>> Regards,
> >>> Peter.
> >>>
> >>>
> >>>> Hi.
> >>>> Thanks for replying.
> >>>> My document has many different
> fields(about 30 fields, 10 different
> >>>> type of documents but these are not the
> point ) and I have to search
> >>>> over several fields.
> >>>> I was putting all 76M documents into
> several lucene indexes and use
> >>>> the default lucene.net ParaSearch to
> search over these indexes. That
> >>>> was slow, more than 20s.
> >>>> Then someone suggested I need to merge all
> our indexes into a huge
> >>>> one, he thought lucene can handle 76M
> documents in one index easily.
> >>>> Then I merged all the documents into a
> single huge one(which took me
> >>>> 3 days) . That time, the index folder is
> about 15G(I don't store
> >>>> info into index, just index them).
> Actually the search is still very
> >>>> slow, more than 20s too, and looks slower
> than use several indexes.
> >>>> Then I come to solr. Why I put 1M into
> each core is I found when a
> >>>> core has 1M document, the search speed is
> fast, range from 0-500ms,
> >>>> which is acceptable. I don't know how many
> documents to saved in one
> >>>> core is proper.
> >>>> The problem is even if I put 2M documents
> into each core. Then I
> >>>> have only 36 cores at the moment. But when
> our documents doubles in
> >>>> the future, same issue will rise again. So
> I don't think save 1M in
> >>>> each core is the issue.
> >>>> The issue is I put too many cores into one
> server. I don't have
> >>>> extra server to spread solr cores. So we
> have to improve solr search
> >>>> speed from some other way.
> >>>> Any suggestion?
> >>>>
> >>>> Regards.
> >>>> Scott
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> 在2010-07-15 15:24:08,"Fornoville, Tom"
> <Tom.Fornoville@truvo.com>
> >>>> 写道:
> >>>>
> >>>>> Is there any reason why you have to
> limit each instance to only 1M
> >>>>> documents?
> >>>>> If you could put more documents in the
> same core I think it would
> >>>>> dramatically improve your response
> times.
> >>>>>
> >>>>> -----Original Message-----
> >>>>> From: marship [mailto:marship@126.com]
> >>>>> Sent: donderdag 15 juli 2010 6:23
> >>>>> To: solr-user
> >>>>> Subject: How to speed up solr search
> speed
> >>>>>
> >>>>> Hi. All.
> >>>>> I got a problem with distributed solr
> search. The issue is
> >>>>> I have 76M documents spread over 76
> solr instances, each instance
> >>>>> handles 1M documents.
> >>>>> Previously I put all 76 instances on
> single server and when I tested
> >>>>> I found each time it runs, it will
> take several times, mostly 10-20s to
> >>>>> finish a search.
> >>>>> Now, I split these instances into 2
> servers. each one with 38
> >>>>> instances. the search speed is about
> 5-10s each time.
> >>>>> 10s is a bit unacceptable for me. And
> based on my observation, the slow
> >>>>> is caused by disk operation as all
> theses instances are on same server.
> >>>>> Because when I test each single
> instance, it is purely fast, always
> >>>>> ~400ms. When I use distributed search,
> I found some instance say it
> >>>>> need
> >>>>> 7000+ms.
> >>>>> Our server has plenty of memory free
> of use. I am thinking is there a
> >>>>> way we can make solr use more memory
> instead of harddisk index, like,
> >>>>> load all indexes into memory so it can
> speed up?
> >>>>>
> >>>>> welcome any help.
> >>>>> Thanks.
> >>>>> Regards.
> >>>>> Scott
> >>>>>
> >
> 
Dennis Gearon

Signature Warning
----------------
EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Fri, 7/16/10, marship <marship@126.com> wrote:

> From: marship <marship@126.com>
> Subject: Re:Re: How to speed up solr search speed
> To: solr-user@lucene.apache.org
> Date: Friday, July 16, 2010, 11:26 AM
> Hi. Peter. 
> 
>  Thanks for replying.
> 
> 
> >Hi Scott!
> >
> >> I am aware these cores on same server are
> interfering with each other.
> >
> >Thats not good. Try to use only one core per CPU. With
> more per CPU you
> >won't have any benefits over the single-core version, I
> think.
> 
>  I only have 2 servers, each CPU with 8 cores. Each server
> has 6G memory. So I have 16 CPU core in total. But I have 70
> solr cores so I have to use them on my 2 servers. Based on
> my observation, even when the search is processing, the CPU
> usage is not high. The memory usage is not high too. Each
> solr(jetty) instance on consume 40M-60M memory. My server
> always have 2-3G memory availabe.
> >
> >> can solr use more memory to avoid disk operation
> conflicts?
> >
> >Yes, only the memory you have on the machine of course.
> Are you using
> >tomcat or jetty?
> >
> 
> I am using jetty.
> >> For my case, I don't think solr can work as fast
> as 100-200ms on average.
> >
> >We have indices with a lot entries not as large as
> yours, but in the
> >range of X Million. and have response times under
> 100ms.
> >What about testing only one core with 5-10 Mio docs? If
> the response
> >time isn't any better maybe you need a different field
> config or sth.
> >different is wrong?
> 
> For the moment, I really don't know. I tried to use java
> -sever -jar start.jar to start jetty/solr. I saw when solr
> start, sometimes some core search for simple keyword like
> "design" will take 70s, of course some only take 0-15ms.
> From my aspect, I do believe it is the harddisk accessed by
> these cores deplays each other. So finally some cores fall
> behind. But the bad news for me is the solr distriubted
> search's speed is decided by the slowest one. 
> 
> 
> >
> >> So should I add it or the default(without it ) is
> ok?
> >
> >Without is also okay -> solr uses default.
> >With 75 Mio docs it should around 20 000 but I guess
> there is sth.
> >different wrong: maybe caching or field definition.
> Could you post the
> >latter one?
> >
> 
> Sorry. What are you asking me to post?
> 
>  
> 
> 
> >Regards,
> >Peter.
> >
> >> Hi. Peter.
> >> I think I am not using faceting, highlighting ...
> I read about them
> >> but don't know how to work with them. I am using
> the default "example"
> >> just change the indexed fields.
> >> For my case, I don't think solr can work as fast
> as 100-200ms on
> >> average. I tried some keywords on only single solr
> instance. It
> >> sometimes takes more than 20s. I just input 4
> keywords. I agree it is
> >> keyword concerns. But the issue is it doesn't work
> consistently.
> >>
> >> When 37 instances on same server works at same
> time (when a
> >> distributed search start), it goes worse, I saw
> some solr cores
> >> execute very fast, 0ms, ~40ms, ~200ms. But more
> solr cores executed as
> >> ~2500ms, ~3500ms, ~6700ms. and about 5-10 solr
> cores need more than
> >> 17s. I have 70 cores running. And the search speed
> depends on the
> >> SLOWEST one. Even 69 cores can run at 1ms. but
> last one need 50s. then
> >> the distributed search speed is 50s.
> >> I am aware these cores on same server are
> interfering with each other.
> >> As I have lots of free memory. I want to know,
> with the prerequisite,
> >> can solr use more memory to avoid disk operation
> conflicts?
> >>
> >> Thanks.
> >> Regards.
> >> Scott
> >>
> >> 在2010-07-15 17:19:57,"Peter Karich" <peathal@yahoo.de>
> 写道:
> >>> How does your queries look like? Do you use
> faceting, highlighting, ... ?
> >>> Did you try to customize the cache?
> >>> Setting the HashDocSet to "0.005 of all
> documents" improves our
> >>> search speed a lot.
> >>> Did you optimize the index?
> >>>
> >>> 500ms seems to be slow for an 'average'
> search. I am not an expert
> >>> but without highlighting it should be faster
> as 100ms or at least 200ms
> >>>
> >>> Regards,
> >>> Peter.
> >>>
> >>>
> >>>> Hi.
> >>>> Thanks for replying.
> >>>> My document has many different
> fields(about 30 fields, 10 different
> >>>> type of documents but these are not the
> point ) and I have to search
> >>>> over several fields.
> >>>> I was putting all 76M documents into
> several lucene indexes and use
> >>>> the default lucene.net ParaSearch to
> search over these indexes. That
> >>>> was slow, more than 20s.
> >>>> Then someone suggested I need to merge all
> our indexes into a huge
> >>>> one, he thought lucene can handle 76M
> documents in one index easily.
> >>>> Then I merged all the documents into a
> single huge one(which took me
> >>>> 3 days) . That time, the index folder is
> about 15G(I don't store
> >>>> info into index, just index them).
> Actually the search is still very
> >>>> slow, more than 20s too, and looks slower
> than use several indexes.
> >>>> Then I come to solr. Why I put 1M into
> each core is I found when a
> >>>> core has 1M document, the search speed is
> fast, range from 0-500ms,
> >>>> which is acceptable. I don't know how many
> documents to saved in one
> >>>> core is proper.
> >>>> The problem is even if I put 2M documents
> into each core. Then I
> >>>> have only 36 cores at the moment. But when
> our documents doubles in
> >>>> the future, same issue will rise again. So
> I don't think save 1M in
> >>>> each core is the issue.
> >>>> The issue is I put too many cores into one
> server. I don't have
> >>>> extra server to spread solr cores. So we
> have to improve solr search
> >>>> speed from some other way.
> >>>> Any suggestion?
> >>>>
> >>>> Regards.
> >>>> Scott
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> 在2010-07-15 15:24:08,"Fornoville, Tom"
> <Tom.Fornoville@truvo.com>
> >>>> 写道:
> >>>>
> >>>>> Is there any reason why you have to
> limit each instance to only 1M
> >>>>> documents?
> >>>>> If you could put more documents in the
> same core I think it would
> >>>>> dramatically improve your response
> times.
> >>>>>
> >>>>> -----Original Message-----
> >>>>> From: marship [mailto:marship@126.com]
> >>>>> Sent: donderdag 15 juli 2010 6:23
> >>>>> To: solr-user
> >>>>> Subject: How to speed up solr search
> speed
> >>>>>
> >>>>> Hi. All.
> >>>>> I got a problem with distributed solr
> search. The issue is
> >>>>> I have 76M documents spread over 76
> solr instances, each instance
> >>>>> handles 1M documents.
> >>>>> Previously I put all 76 instances on
> single server and when I tested
> >>>>> I found each time it runs, it will
> take several times, mostly 10-20s to
> >>>>> finish a search.
> >>>>> Now, I split these instances into 2
> servers. each one with 38
> >>>>> instances. the search speed is about
> 5-10s each time.
> >>>>> 10s is a bit unacceptable for me. And
> based on my observation, the slow
> >>>>> is caused by disk operation as all
> theses instances are on same server.
> >>>>> Because when I test each single
> instance, it is purely fast, always
> >>>>> ~400ms. When I use distributed search,
> I found some instance say it
> >>>>> need
> >>>>> 7000+ms.
> >>>>> Our server has plenty of memory free
> of use. I am thinking is there a
> >>>>> way we can make solr use more memory
> instead of harddisk index, like,
> >>>>> load all indexes into memory so it can
> speed up?
> >>>>>
> >>>>> welcome any help.
> >>>>> Thanks.
> >>>>> Regards.
> >>>>> Scott
> >>>>>
> >
> 

Mime
View raw message