lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Performance question on Spatial Search
Date Tue, 30 Jul 2013 12:13:34 GMT
bq:  i've added {!cache=false}

Ahh, ok. forget my comments on warming then, they're irrelevant. Heap probably
isn't relevant either given, as you say, you don't see pressure there.

What puzzles me then is why you're spending all your time in
copyToByteArray(long,Object,long,long). I _suppose_ (and I'm really reaching
here, I don't know the code) that you could be in a spot where you're swapping
out from heap memory to virtual memory with MMapDirectory and back. But
that's just grasping at straws.

Let us know what you find out, we should understand this...

Erick

On Mon, Jul 29, 2013 at 10:41 PM, Steven Bower <smb-apache@alcyon.net> wrote:
> @Erick it is alot of hw, but basically trying to create a "best case
> scenario" to take HW out of the question. Will try increasing heap size
> tomorrow.. I haven't seen it get close to the max heap size yet.. but it's
> worth trying...
>
> Note that these queries look something like:
>
> q=*:*
> fq=[date range]
> fq=geo query
>
> on the fq for the geo query i've added {!cache=false} to prevent it from
> ending up in the filter cache.. once it's in filter cache queries come back
> in 10-20ms. For my use case i need the first unique geo search query to
> come back in a more reasonable time so I am currently ignoring the cache.
>
> @Bill will look into that, I'm not certain it will support the particular
> queries that are being executed but I'll investigate..
>
> steve
>
>
> On Mon, Jul 29, 2013 at 6:25 PM, Erick Erickson <erickerickson@gmail.com>wrote:
>
>> This is very strange. I'd expect slow queries on
>> the first few queries while these caches were
>> warmed, but after that I'd expect things to
>> be quite fast.
>>
>> For a 12G index and 256G RAM, you have on the
>> surface a LOT of hardware to throw at this problem.
>> You can _try_ giving the JVM, say, 18G but that
>> really shouldn't be a big issue, your index files
>> should be MMaped.
>>
>> Let's try the crude thing first and give the JVM
>> more memory.
>>
>> FWIW
>> Erick
>>
>> On Mon, Jul 29, 2013 at 4:45 PM, Steven Bower <smb-apache@alcyon.net>
>> wrote:
>> > I've been doing some performance analysis of a spacial search use case
>> I'm
>> > implementing in Solr 4.3.0. Basically I'm seeing search times alot higher
>> > than I'd like them to be and I'm hoping people may have some suggestions
>> > for how to optimize further.
>> >
>> > Here are the specs of what I'm doing now:
>> >
>> > Machine:
>> > - 16 cores @ 2.8ghz
>> > - 256gb RAM
>> > - 1TB (RAID 1+0 on 10 SSD)
>> >
>> > Content:
>> > - 45M docs (not very big only a few fields with no large textual content)
>> > - 1 geo field (using config below)
>> > - index is 12gb
>> > - 1 shard
>> > - Using MMapDirectory
>> >
>> > Field config:
>> >
>> > <fieldType name="geo" class="solr.SpatialRecursivePrefixTreeFieldType"
>> > distErrPct="0.025" maxDistErr="0.00045"
>> >
>> spatialContextFactory="com.spatial4j.core.context.jts.JtsSpatialContextFactory"
>> > units="degrees"/>
>> >
>> > <field  name="geopoint" indexed="true" multiValued="false"
>> > required="false" stored="true" type="geo"/>
>> >
>> >
>> > What I've figured out so far:
>> >
>> > - Most of my time (98%) is being spent in
>> > java.nio.Bits.copyToByteArray(long,Object,long,long) which is being
>> > driven by
>> BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.loadBlock()
>> > which from what I gather is basically reading terms from the .tim file
>> > in blocks
>> >
>> > - I moved from Java 1.6 to 1.7 based upon what I read here:
>> > http://blog.vlad1.com/2011/10/05/looking-at-java-nio-buffer-performance/
>> > and it definitely had some positive impact (i haven't been able to
>> > measure this independantly yet)
>> >
>> > - I changed maxDistErr from 0.000009 (which is 1m precision per docs)
>> > to 0.00045 (50m precision) ..
>> >
>> > - It looks to me that the .tim file are being memory mapped fully (ie
>> > they show up in pmap output) the virtual size of the jvm is ~18gb
>> > (heap is 6gb)
>> >
>> > - I've optimized the index but this doesn't have a dramatic impact on
>> > performance
>> >
>> > Changing the precision and the JVM upgrade yielded a drop from ~18s
>> > avg query time to ~9s avg query time.. This is fantastic but I want to
>> > get this down into the 1-2 second range.
>> >
>> > At this point it seems that basically i am bottle-necked on basically
>> > copying memory out of the mapped .tim file which leads me to think
>> > that the only solution to my problem would be to read less data or
>> > somehow read it more efficiently..
>> >
>> > If anyone has any suggestions of where to go with this I'd love to know
>> >
>> >
>> > thanks,
>> >
>> > steve
>>

Mime
View raw message