lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Performance question on Spatial Search
Date Mon, 29 Jul 2013 22:25:30 GMT
This is very strange. I'd expect slow queries on
the first few queries while these caches were
warmed, but after that I'd expect things to
be quite fast.

For a 12G index and 256G RAM, you have on the
surface a LOT of hardware to throw at this problem.
You can _try_ giving the JVM, say, 18G but that
really shouldn't be a big issue, your index files
should be MMaped.

Let's try the crude thing first and give the JVM
more memory.

FWIW
Erick

On Mon, Jul 29, 2013 at 4:45 PM, Steven Bower <smb-apache@alcyon.net> wrote:
> I've been doing some performance analysis of a spacial search use case I'm
> implementing in Solr 4.3.0. Basically I'm seeing search times alot higher
> than I'd like them to be and I'm hoping people may have some suggestions
> for how to optimize further.
>
> Here are the specs of what I'm doing now:
>
> Machine:
> - 16 cores @ 2.8ghz
> - 256gb RAM
> - 1TB (RAID 1+0 on 10 SSD)
>
> Content:
> - 45M docs (not very big only a few fields with no large textual content)
> - 1 geo field (using config below)
> - index is 12gb
> - 1 shard
> - Using MMapDirectory
>
> Field config:
>
> <fieldType name="geo" class="solr.SpatialRecursivePrefixTreeFieldType"
> distErrPct="0.025" maxDistErr="0.00045"
> spatialContextFactory="com.spatial4j.core.context.jts.JtsSpatialContextFactory"
> units="degrees"/>
>
> <field  name="geopoint" indexed="true" multiValued="false"
> required="false" stored="true" type="geo"/>
>
>
> What I've figured out so far:
>
> - Most of my time (98%) is being spent in
> java.nio.Bits.copyToByteArray(long,Object,long,long) which is being
> driven by BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.loadBlock()
> which from what I gather is basically reading terms from the .tim file
> in blocks
>
> - I moved from Java 1.6 to 1.7 based upon what I read here:
> http://blog.vlad1.com/2011/10/05/looking-at-java-nio-buffer-performance/
> and it definitely had some positive impact (i haven't been able to
> measure this independantly yet)
>
> - I changed maxDistErr from 0.000009 (which is 1m precision per docs)
> to 0.00045 (50m precision) ..
>
> - It looks to me that the .tim file are being memory mapped fully (ie
> they show up in pmap output) the virtual size of the jvm is ~18gb
> (heap is 6gb)
>
> - I've optimized the index but this doesn't have a dramatic impact on
> performance
>
> Changing the precision and the JVM upgrade yielded a drop from ~18s
> avg query time to ~9s avg query time.. This is fantastic but I want to
> get this down into the 1-2 second range.
>
> At this point it seems that basically i am bottle-necked on basically
> copying memory out of the mapped .tim file which leads me to think
> that the only solution to my problem would be to read less data or
> somehow read it more efficiently..
>
> If anyone has any suggestions of where to go with this I'd love to know
>
>
> thanks,
>
> steve

Mime
View raw message