lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Geert-Jan Brits <gbr...@gmail.com>
Subject Re: Re: How to speed up solr search speed
Date Sat, 17 Jul 2010 11:30:26 GMT
>My query string is always simple like "design", "principle of design",
"tom"
>EG:
>URL:
http://localhost:7550/solr/select/?q=design&version=2.2&start=0&rows=10&indent=on

IMO, indeed with these types of simple searches caching (and thus RAM usage)
can not be fully exploited, i.e: there isn't really anything to cache (no
sort-ordering, faceting (Lucene fieldcache), no documentsets,faceting (Solr
filtercache))

The only thing that helps you here would be a big solr querycache, depending
on how often queries are repeated.
Just execute the same query twice, the second time you should see a fast
response (say < 20ms) that's the querycache (and thus RAM)  working for
you.

>Now the issue I found is search with "fq" argument looks slow down the
search.

This doesn't align with your previous statement that you only use search
with a q-param (e.g:
http://localhost:7550/solr/select/?q=design&version=2.2&start=0&rows=10&indent=on
)
For your own sake, explain what you're trying to do, otherwise we really are
guessing in the dark.

Anyway the FQ-param let's you cache (using the Solr-filtercache)  individual
documentsets that can be used to efficiently to intersect your resultset.
Also the first time, caches should be warmed (i.e: the fq-query should be
exectuted and results saved to cache, since there isn't anything there yet)
. Only on the second time would you start seeing improvements.

For instance:
http://localhost:7550/solr/select/?q=design&fq=doctype:pdf&version=2.2&start=0&rows=10&indent=on<http://localhost:7550/solr/select/?q=design&version=2.2&start=0&rows=10&indent=on>

<http://localhost:7550/solr/select/?q=design&version=2.2&start=0&rows=10&indent=on>would
only show documents containing "design" when the doctype=pdf (Again this is
just an example here where I'm just assuming that you have defined a field
'doctype')
since the nr of values of documenttype would be pretty low and would be used
independently of other queries, this would be an excellent candidate for the
FQ-param.

http://wiki.apache.org/solr/CommonQueryParameters#fq
<http://wiki.apache.org/solr/CommonQueryParameters#fq>
This was a longer reply than I wanted to. Really think about your use-cases
first, then present some real examples of what you want to achieve and then
we can help you in a more useful manner.

Cheers,
Geert-Jan

2010/7/17 marship <marship@126.com>

> Hi. Peter and All.
> I merged my indexes today. Now each index stores 10M document. Now I only
> have 10 solr cores.
> And I used
>
> java -Xmx1g -jar -server start.jar
> to start the jetty server.
>
> At first I deployed them all on one search. The search speed is about 3s.
> Then I noticed from cmd output when search start, 4 of 10's QTime only cost
> about 10ms-500ms. The left 5 cost more, up to 2-3s. Then I put 6 on web
> server, 4 on another(DB, high load most time). Then the search speed goes
> down to about 1s most time.
> Now most search takes about 1s. That's great.
>
> I watched the jetty output on cmd windows on web server, now when each
> search start, I saw 2 of 6 costs 60ms-80ms. The another 4 cost 170ms -
> 700ms.  I do believe the bottleneck is still the hard disk. But at least,
> the search speed at the moment is acceptable. Maybe i should try memdisk to
> see if that help.
>
>
> And for -Xmx1g, actually I only see jetty consume about 150M memory,
> consider now the index is 10x bigger. I don't think that works. I googled
> -Xmx is go enlarge the heap size. Not sure can that help search.  I still
> have 3.5G memory free on server.
>
> Now the issue I found is search with "fq" argument looks slow down the
> search.
>
> Thanks All for your help and suggestions.
> Thanks.
> Regards.
> Scott
>
>
> 在2010-07-17 03:36:19,"Peter Karich" <peathal@yahoo.de> 写道:
> >> > Each solr(jetty) instance on consume 40M-60M memory.
> >
> >> java -Xmx1024M -jar start.jar
> >
> >That's a good suggestion!
> >Please, double check that you are using the -server version of the jvm
> >and the latest 1.6.0_20 or so.
> >
> >Additionally you can start jvisualvm (shipped with the jdk) and hook
> >into jetty/tomcat easily to see the current CPU and memory load.
> >
> >> But I have 70 solr cores
> >
> >if you ask me: I would reduce them to 10-15 or even less and increase
> >the RAM.
> >try out tomcat too
> >
> >> solr distriubted search's speed is decided by the slowest one.
> >
> >so, try to reduce the cores
> >
> >Regards,
> >Peter.
> >
> >> you mentioned that you have a lot of mem free, but your yetty containers
> >> only using between 40-60 mem.
> >>
> >> probably stating the obvious, but have you increased the -Xmx param like
> for
> >> instance:
> >> java -Xmx1024M -jar start.jar
> >>
> >> that way you're configuring the container to use a maximum of 1024 MB
> ram
> >> instead of the standard which is much lower (I'm not sure what exactly
> but
> >> it could well be 64MB for non -server, aligning with what you're seeing)
> >>
> >> Geert-Jan
> >>
> >> 2010/7/16 marship <marship@126.com>
> >>
> >>
> >>> Hi Tom Burton-West.
> >>>
> >>>  Sorry looks my email ISP filtered out your replies. I checked web
> version
> >>> of mailing list and saw your reply.
> >>>
> >>>  My query string is always simple like "design", "principle of design",
> >>> "tom"
> >>>
> >>>
> >>>
> >>> EG:
> >>>
> >>> URL:
> >>>
> http://localhost:7550/solr/select/?q=design&version=2.2&start=0&rows=10&indent=on
> >>>
> >>> Response:
> >>>
> >>> <response>
> >>> -
> >>> <lst name="responseHeader">
> >>> <int name="status">0</int>
> >>> <int name="QTime">16</int>
> >>> -
> >>> <lst name="params">
> >>> <str name="indent">on</str>
> >>> <str name="start">0</str>
> >>> <str name="q">design</str>
> >>> <str name="version">2.2</str>
> >>> <str name="rows">10</str>
> >>> </lst>
> >>> </lst>
> >>> -
> >>> <result name="response" numFound="5981" start="0">
> >>> -
> >>> <doc>
> >>> <str name="id">product_208619</str>
> >>> </doc>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> EG:
> >>>
> http://localhost:7550/solr/select/?q=Principle&version=2.2&start=0&rows=10&indent=on
> >>>
> >>> <response>
> >>> -
> >>> <lst name="responseHeader">
> >>> <int name="status">0</int>
> >>> <int name="QTime">94</int>
> >>> -
> >>> <lst name="params">
> >>> <str name="indent">on</str>
> >>> <str name="start">0</str>
> >>> <str name="q">Principle</str>
> >>> <str name="version">2.2</str>
> >>> <str name="rows">10</str>
> >>> </lst>
> >>> </lst>
> >>> -
> >>> <result name="response" numFound="104" start="0">
> >>> -
> >>> <doc>
> >>> <str name="id">product_56926</str>
> >>> </doc>
> >>>
> >>>
> >>>
> >>> As I am querying over single core and other cores are not querying at
> same
> >>> time. The QTime looks good.
> >>>
> >>> But when I query the distributed node: (For this case, 6422ms is still
> a
> >>> not bad one. Many cost ~20s)
> >>>
> >>> URL:
> >>>
> http://localhost:7499/solr/select/?q=the+first+world+war&version=2.2&start=0&rows=10&indent=on&debugQuery=true
> >>>
> >>> Response:
> >>>
> >>> <response>
> >>> -
> >>> <lst name="responseHeader">
> >>> <int name="status">0</int>
> >>> <int name="QTime">6422</int>
> >>> -
> >>> <lst name="params">
> >>> <str name="debugQuery">true</str>
> >>> <str name="indent">on</str>
> >>> <str name="start">0</str>
> >>> <str name="q">the first world war</str>
> >>> <str name="version">2.2</str>
> >>> <str name="rows">10</str>
> >>> </lst>
> >>> </lst>
> >>> -
> >>> <result name="response" numFound="4231" start="0">
> >>>
> >>>
> >>>
> >>> Actually I am thinking and testing a solution: As I believe the
> bottleneck
> >>> is in harddisk and all our indexes add up is about 10-15G. What about I
> just
> >>> add another 16G memory to my server then use "MemDisk" to map a memory
> disk
> >>> and put all my indexes into it. Then each time, solr/jetty need to load
> >>> index from harddisk, it is loading from memory. This should give solr
> the
> >>> most throughout and avoid the harddisk access delay. I am testing ....
> >>>
> >>> But if there are way to make solr use better use our limited resource
> to
> >>> avoid adding new ones. that would be great.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>
> >
> >
> >--
> >http://karussell.wordpress.com/
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message