lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raghuveer Kancherla <raghuveer.kanche...@aplopio.com>
Subject Re: Retrieving large num of docs
Date Tue, 01 Dec 2009 09:47:56 GMT
Hi Hoss/Andrew,
I think I solved the problem of retrieving 300 docs per request for now. The
problem was that I was storing 2 moderately large multivalued text fields
though I was not retrieving them during search time.  I reindexed all my
data without storing these fields. Now the response time (time for Solr to
return the http response) is very close to the QTime Solr is showing in the
logs.

Thanks for all the help,
Raghu


On Mon, Nov 30, 2009 at 11:37 AM, Raghuveer Kancherla <
raghuveer.kancherla@aplopio.com> wrote:

> Thanks Hoss,
> In my previous mail, I was measuring the system time difference between
> sending a (http) request and receiving a response. This was being run on a
> (different) client machine
>
> Like you suggested, I tried to time the response on the server itself as
> follows:
>
> $ /usr/bin/time -p curl -sS -o solr.out "
> http://localhost:1212/solr/select/?rows=300&q=%28ResumeAllText%3A%28%28%28%22java+j2ee%22+%28java+j2ee%29%29%29%5E4%29%5E1.0%29&start=0&wt=python
> "
> real 3.49
>
> user 0.00
> sys 0.00
>
> The query time in solr log shows me Qtime=600
> size of solr.out is 843 kB.
>
> As you've mentioned, Solr shouldn't give these kind of numbers for 300
> docs, and we're quite perplexed as to whats going on.
>
> Thanks,
> Raghu
>
>
>
>
> On Mon, Nov 30, 2009 at 6:00 AM, Chris Hostetter <hossman_lucene@fucit.org
> > wrote:
>
>>
>> : I am using Solr1.4 for searching through half a million documents. The
>> : problem is, I want to retrieve nearly 200 documents for each search
>> query.
>> : The query time in Solr logs is showing 0.02 seconds and I am fairly
>> happy
>> : with that. However Solr is taking a long time (4 to 5 secs) to return
>> the
>> : results (I think it is because of the number of docs I am requesting). I
>> : tried returning only the id's (unique key) without any other stored
>> fields,
>> : but it is not helping me improve the response times (time to return the
>> id's
>> : of matching documents).
>>
>> What exactly does your request URL look like, and how exactly are you
>> timing the total response time?
>>
>> 200 isn't a very big number for the rows param -- people who want to get
>> 100K documents back in their response at a time may have problems, but 200
>> is not that big.
>>
>> so like i said: how exactly are you timing things?
>>
>> My guess: it's more likely that network overhead or the performance of
>> your client code (reading the data off the wire) is causing your timing
>> code to seem slow, then it is that Solr is taking 5 seconds to write out
>> those document IDs.
>>
>> I suspect if you try hitting the same exact URL using curl via localhost,
>> you'll see the total response time be a lot less then 5 seconds.
>>
>> Here's an example of a query that asks solr to return *every* field from
>> 500 documents, in the XML format.  And these are not small documents...
>>
>> $ /usr/bin/time -p curl -sS -o /tmp/solr.out "
>> http://localhost:5051/solr/select/?q=doctype:product&version=2.2&start=0&rows=500&indent=on
>> "
>> real 0.07
>> user 0.00
>> sys 0.00
>> [chrish@c18-ssa-so-dfll-qry1 ~]$ du -sh /tmp/solr.out
>> 1.6M    /tmp/solr.out
>>
>> ...that's 1.6 MB of 500 Solr documents with all of their fields in
>> verbose XML format (including indenting) fetched in 70ms.
>>
>> If it's taking 5 seconds for you to get just the ids of 200 docs, you've
>> got a problem somewhere and i'm 99% certain it's not in Solr.
>>
>> what does a similar "time curl" command for your URL look like when you
>> run it on your solr server?
>>
>>
>> -Hoss
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message