lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From marship <mars...@126.com>
Subject Re:Re: Re: How to speed up solr search speed
Date Sat, 17 Jul 2010 14:01:08 GMT
Hi. Geert-Jan.
   Thanks for replying.
   I know solr has querycache and it improves the search speed from second time. Actually
when I talk about the search speed. I don't mean talking about the speed of cache. When user
search on our site, I don't want the first time cost 10s and all following cost 0s. These
are unacceptable. So I want the first time to be as fast as it can. So all my test speed only
count the first time.  
   For fq, yes, I need that. We have 5 different types, for general search, user doesn't need
to specify which type he need to search over. But sometimes he needs to search over eg: type:product,
that's the time I used "fq" and I believe I understand it correctly. Before I get today's
speed, I was always testing against the simple search "design" etc, for the time before today,
even the simple search speed is not acceptable so I doesn't care how "fq" speed will go. Today,
as the simple search speed is acceptable. I move on to check "fq" and looks it sometimes is
much slower than the simple search(The slower means it would take more than 2s, maybe 10s)
.  

>The only thing that helps you here would be a big solr querycache, depending
>on how often queries are repeated.
I don't agree. I don't really care the speed of cache as I know it is always super fast. What
I want to for solr is to consume as many memory as it can to pre-load the lucene index(maybe
be 50% or even 100%). Then when the time comes it need to do the first time of a keyword.
It is fast. (I haven't got the answer for this question.)

Thanks.
Regards.




在2010-07-17 19:30:26,"Geert-Jan Brits" <gbrits@gmail.com> 写道:
>>My query string is always simple like "design", "principle of design",
>"tom"
>>EG:
>>URL:
>http://localhost:7550/solr/select/?q=design&version=2.2&start=0&rows=10&indent=on
>
>IMO, indeed with these types of simple searches caching (and thus RAM usage)
>can not be fully exploited, i.e: there isn't really anything to cache (no
>sort-ordering, faceting (Lucene fieldcache), no documentsets,faceting (Solr
>filtercache))
>
>The only thing that helps you here would be a big solr querycache, depending
>on how often queries are repeated.
>Just execute the same query twice, the second time you should see a fast
>response (say < 20ms) that's the querycache (and thus RAM)  working for
>you.
>
>>Now the issue I found is search with "fq" argument looks slow down the
>search.
>
>This doesn't align with your previous statement that you only use search
>with a q-param (e.g:
>http://localhost:7550/solr/select/?q=design&version=2.2&start=0&rows=10&indent=on
>)
>For your own sake, explain what you're trying to do, otherwise we really are
>guessing in the dark.
>
>Anyway the FQ-param let's you cache (using the Solr-filtercache)  individual
>documentsets that can be used to efficiently to intersect your resultset.
>Also the first time, caches should be warmed (i.e: the fq-query should be
>exectuted and results saved to cache, since there isn't anything there yet)
>. Only on the second time would you start seeing improvements.
>
>For instance:
>http://localhost:7550/solr/select/?q=design&fq=doctype:pdf&version=2.2&start=0&rows=10&indent=on<http://localhost:7550/solr/select/?q=design&version=2.2&start=0&rows=10&indent=on>
>
><http://localhost:7550/solr/select/?q=design&version=2.2&start=0&rows=10&indent=on>would
>only show documents containing "design" when the doctype=pdf (Again this is
>just an example here where I'm just assuming that you have defined a field
>'doctype')
>since the nr of values of documenttype would be pretty low and would be used
>independently of other queries, this would be an excellent candidate for the
>FQ-param.
>
>http://wiki.apache.org/solr/CommonQueryParameters#fq
><http://wiki.apache.org/solr/CommonQueryParameters#fq>
>This was a longer reply than I wanted to. Really think about your use-cases
>first, then present some real examples of what you want to achieve and then
>we can help you in a more useful manner.
>
>Cheers,
>Geert-Jan
>
>2010/7/17 marship <marship@126.com>
>
>> Hi. Peter and All.
>> I merged my indexes today. Now each index stores 10M document. Now I only
>> have 10 solr cores.
>> And I used
>>
>> java -Xmx1g -jar -server start.jar
>> to start the jetty server.
>>
>> At first I deployed them all on one search. The search speed is about 3s.
>> Then I noticed from cmd output when search start, 4 of 10's QTime only cost
>> about 10ms-500ms. The left 5 cost more, up to 2-3s. Then I put 6 on web
>> server, 4 on another(DB, high load most time). Then the search speed goes
>> down to about 1s most time.
>> Now most search takes about 1s. That's great.
>>
>> I watched the jetty output on cmd windows on web server, now when each
>> search start, I saw 2 of 6 costs 60ms-80ms. The another 4 cost 170ms -
>> 700ms.  I do believe the bottleneck is still the hard disk. But at least,
>> the search speed at the moment is acceptable. Maybe i should try memdisk to
>> see if that help.
>>
>>
>> And for -Xmx1g, actually I only see jetty consume about 150M memory,
>> consider now the index is 10x bigger. I don't think that works. I googled
>> -Xmx is go enlarge the heap size. Not sure can that help search.  I still
>> have 3.5G memory free on server.
>>
>> Now the issue I found is search with "fq" argument looks slow down the
>> search.
>>
>> Thanks All for your help and suggestions.
>> Thanks.
>> Regards.
>> Scott
>>
>>
>> 在2010-07-17 03:36:19,"Peter Karich" <peathal@yahoo.de> 写道:
>> >> > Each solr(jetty) instance on consume 40M-60M memory.
>> >
>> >> java -Xmx1024M -jar start.jar
>> >
>> >That's a good suggestion!
>> >Please, double check that you are using the -server version of the jvm
>> >and the latest 1.6.0_20 or so.
>> >
>> >Additionally you can start jvisualvm (shipped with the jdk) and hook
>> >into jetty/tomcat easily to see the current CPU and memory load.
>> >
>> >> But I have 70 solr cores
>> >
>> >if you ask me: I would reduce them to 10-15 or even less and increase
>> >the RAM.
>> >try out tomcat too
>> >
>> >> solr distriubted search's speed is decided by the slowest one.
>> >
>> >so, try to reduce the cores
>> >
>> >Regards,
>> >Peter.
>> >
>> >> you mentioned that you have a lot of mem free, but your yetty containers
>> >> only using between 40-60 mem.
>> >>
>> >> probably stating the obvious, but have you increased the -Xmx param like
>> for
>> >> instance:
>> >> java -Xmx1024M -jar start.jar
>> >>
>> >> that way you're configuring the container to use a maximum of 1024 MB
>> ram
>> >> instead of the standard which is much lower (I'm not sure what exactly
>> but
>> >> it could well be 64MB for non -server, aligning with what you're seeing)
>> >>
>> >> Geert-Jan
>> >>
>> >> 2010/7/16 marship <marship@126.com>
>> >>
>> >>
>> >>> Hi Tom Burton-West.
>> >>>
>> >>>  Sorry looks my email ISP filtered out your replies. I checked web
>> version
>> >>> of mailing list and saw your reply.
>> >>>
>> >>>  My query string is always simple like "design", "principle of design",
>> >>> "tom"
>> >>>
>> >>>
>> >>>
>> >>> EG:
>> >>>
>> >>> URL:
>> >>>
>> http://localhost:7550/solr/select/?q=design&version=2.2&start=0&rows=10&indent=on
>> >>>
>> >>> Response:
>> >>>
>> >>> <response>
>> >>> -
>> >>> <lst name="responseHeader">
>> >>> <int name="status">0</int>
>> >>> <int name="QTime">16</int>
>> >>> -
>> >>> <lst name="params">
>> >>> <str name="indent">on</str>
>> >>> <str name="start">0</str>
>> >>> <str name="q">design</str>
>> >>> <str name="version">2.2</str>
>> >>> <str name="rows">10</str>
>> >>> </lst>
>> >>> </lst>
>> >>> -
>> >>> <result name="response" numFound="5981" start="0">
>> >>> -
>> >>> <doc>
>> >>> <str name="id">product_208619</str>
>> >>> </doc>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> EG:
>> >>>
>> http://localhost:7550/solr/select/?q=Principle&version=2.2&start=0&rows=10&indent=on
>> >>>
>> >>> <response>
>> >>> -
>> >>> <lst name="responseHeader">
>> >>> <int name="status">0</int>
>> >>> <int name="QTime">94</int>
>> >>> -
>> >>> <lst name="params">
>> >>> <str name="indent">on</str>
>> >>> <str name="start">0</str>
>> >>> <str name="q">Principle</str>
>> >>> <str name="version">2.2</str>
>> >>> <str name="rows">10</str>
>> >>> </lst>
>> >>> </lst>
>> >>> -
>> >>> <result name="response" numFound="104" start="0">
>> >>> -
>> >>> <doc>
>> >>> <str name="id">product_56926</str>
>> >>> </doc>
>> >>>
>> >>>
>> >>>
>> >>> As I am querying over single core and other cores are not querying at
>> same
>> >>> time. The QTime looks good.
>> >>>
>> >>> But when I query the distributed node: (For this case, 6422ms is still
>> a
>> >>> not bad one. Many cost ~20s)
>> >>>
>> >>> URL:
>> >>>
>> http://localhost:7499/solr/select/?q=the+first+world+war&version=2.2&start=0&rows=10&indent=on&debugQuery=true
>> >>>
>> >>> Response:
>> >>>
>> >>> <response>
>> >>> -
>> >>> <lst name="responseHeader">
>> >>> <int name="status">0</int>
>> >>> <int name="QTime">6422</int>
>> >>> -
>> >>> <lst name="params">
>> >>> <str name="debugQuery">true</str>
>> >>> <str name="indent">on</str>
>> >>> <str name="start">0</str>
>> >>> <str name="q">the first world war</str>
>> >>> <str name="version">2.2</str>
>> >>> <str name="rows">10</str>
>> >>> </lst>
>> >>> </lst>
>> >>> -
>> >>> <result name="response" numFound="4231" start="0">
>> >>>
>> >>>
>> >>>
>> >>> Actually I am thinking and testing a solution: As I believe the
>> bottleneck
>> >>> is in harddisk and all our indexes add up is about 10-15G. What about
I
>> just
>> >>> add another 16G memory to my server then use "MemDisk" to map a memory
>> disk
>> >>> and put all my indexes into it. Then each time, solr/jetty need to load
>> >>> index from harddisk, it is loading from memory. This should give solr
>> the
>> >>> most throughout and avoid the harddisk access delay. I am testing ....
>> >>>
>> >>> But if there are way to make solr use better use our limited resource
>> to
>> >>> avoid adding new ones. that would be great.
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>
>> >
>> >
>> >--
>> >http://karussell.wordpress.com/
>> >
>>
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message