lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <>
Subject Re: How large is your solr index?
Date Thu, 08 Jan 2015 00:37:41 GMT
On 1/7/2015 2:26 PM, Joseph Obernberger wrote:
> Thank you Toke - yes - the data is indexed throughout the day.  We are
> handling very few searches - probably 50 a day; this is an R&D system.
> Our HDFS cache, I believe, is too small at 10GBytes per shard.  This
> comes out to 20GBytes of HDFS cache per physical machine plus about
> 10G each for the 2 JVMs running the shards.  Each of those machines is
> also running other services which leaves very little RAM available for
> FS cache.
> Current parameters for running each shard are:
> JAVA_OPTS="-XX:MaxDirectMemorySize=10g -XX:+UseLargePages
> -XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90
> -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC
> -XX:+CMSScavengeBeforeRemark -XX:PretenureSizeThreshold=64m
> -XX:CMSFullGCsBeforeCompaction=1 -XX:+UseCMSInitiatingOccupancyOnly
> -XX:CMSInitiatingOccupancyFraction=70 -XX:CMSTriggerPermRatio=80
> -XX:CMSMaxAbortablePrecleanTime=6000 -XX:+CMSParallelRemarkEnabled
> -XX:+ParallelRefProcEnabled -XX:+AggressiveOpts
> -XX:ParallelGCThreads=7 -Xmx10752m"
> I'd love to try SSDs, but don't have the budget at present to go that
> route.  I'd really like to get the HDFS option to work well as it
> reduces system complexity.  It seems to me that if our HDFS cluster
> has lots/enough spindles, performance should be relatively good, as
> long as the OS can actually do some caching.  We will be adding more
> HDFS nodes in the future, increasing spindle count and reducing the
> amount of data stored into Solr.  When we redo our Solr Cloud, we will
> only run one shard per box, and supply more HDFS cache.

I can make very little comment about HDFS, because I've never used it. 
I can say that you want enough memory such that the data can be fully
cached in the memory on the Solr machine.  If you're in a situation
where caching happens on the HDFS servers but then has to cross the
network to get to Solr, then you'll have your network as a bottleneck
... a gigabit LAN is far slower than local RAM, and tends to be even
slower than modern high-capacity disks, too.

When it comes to GC options, I do have recent and relevant experience. 
Your GC options look a lot like the CMS options that I have been
advising for quite a while ... but recently I have been getting better
results with G1 and some specific tuning options on the latest Java


View raw message