lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Bell <billnb...@gmail.com>
Subject Re: Performance question on Spatial Search
Date Tue, 30 Jul 2013 00:42:34 GMT
Can you compare with the old geo handler as a baseline. ?

Bill Bell
Sent from mobile


On Jul 29, 2013, at 4:25 PM, Erick Erickson <erickerickson@gmail.com> wrote:

> This is very strange. I'd expect slow queries on
> the first few queries while these caches were
> warmed, but after that I'd expect things to
> be quite fast.
> 
> For a 12G index and 256G RAM, you have on the
> surface a LOT of hardware to throw at this problem.
> You can _try_ giving the JVM, say, 18G but that
> really shouldn't be a big issue, your index files
> should be MMaped.
> 
> Let's try the crude thing first and give the JVM
> more memory.
> 
> FWIW
> Erick
> 
> On Mon, Jul 29, 2013 at 4:45 PM, Steven Bower <smb-apache@alcyon.net> wrote:
>> I've been doing some performance analysis of a spacial search use case I'm
>> implementing in Solr 4.3.0. Basically I'm seeing search times alot higher
>> than I'd like them to be and I'm hoping people may have some suggestions
>> for how to optimize further.
>> 
>> Here are the specs of what I'm doing now:
>> 
>> Machine:
>> - 16 cores @ 2.8ghz
>> - 256gb RAM
>> - 1TB (RAID 1+0 on 10 SSD)
>> 
>> Content:
>> - 45M docs (not very big only a few fields with no large textual content)
>> - 1 geo field (using config below)
>> - index is 12gb
>> - 1 shard
>> - Using MMapDirectory
>> 
>> Field config:
>> 
>> <fieldType name="geo" class="solr.SpatialRecursivePrefixTreeFieldType"
>> distErrPct="0.025" maxDistErr="0.00045"
>> spatialContextFactory="com.spatial4j.core.context.jts.JtsSpatialContextFactory"
>> units="degrees"/>
>> 
>> <field  name="geopoint" indexed="true" multiValued="false"
>> required="false" stored="true" type="geo"/>
>> 
>> 
>> What I've figured out so far:
>> 
>> - Most of my time (98%) is being spent in
>> java.nio.Bits.copyToByteArray(long,Object,long,long) which is being
>> driven by BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.loadBlock()
>> which from what I gather is basically reading terms from the .tim file
>> in blocks
>> 
>> - I moved from Java 1.6 to 1.7 based upon what I read here:
>> http://blog.vlad1.com/2011/10/05/looking-at-java-nio-buffer-performance/
>> and it definitely had some positive impact (i haven't been able to
>> measure this independantly yet)
>> 
>> - I changed maxDistErr from 0.000009 (which is 1m precision per docs)
>> to 0.00045 (50m precision) ..
>> 
>> - It looks to me that the .tim file are being memory mapped fully (ie
>> they show up in pmap output) the virtual size of the jvm is ~18gb
>> (heap is 6gb)
>> 
>> - I've optimized the index but this doesn't have a dramatic impact on
>> performance
>> 
>> Changing the precision and the JVM upgrade yielded a drop from ~18s
>> avg query time to ~9s avg query time.. This is fantastic but I want to
>> get this down into the 1-2 second range.
>> 
>> At this point it seems that basically i am bottle-necked on basically
>> copying memory out of the mapped .tim file which leads me to think
>> that the only solution to my problem would be to read less data or
>> somehow read it more efficiently..
>> 
>> If anyone has any suggestions of where to go with this I'd love to know
>> 
>> 
>> thanks,
>> 
>> steve

Mime
View raw message