lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <apa...@elyograg.org>
Subject Re: Optimal FS block size for "small" documents in Solr?
Date Sat, 30 May 2015 16:13:14 GMT
On 5/30/2015 2:51 AM, Gili Nachum wrote:
> Hi, What would be an optimal FS block size to use?
> 
> Using Solr 4.7.2, I have an RAID-5 of SSD drives currently configured with
> a 128KB block size.
> Can I expect better indexing/query time performance with a smaller block
> size (say 8K)?
> Considering my documents are almost always smaller than 8K.
> I assume all stored fields would fit into one block which is good, but what
> will Lucene prefer for reading a long posting list and other data
> structures.

Generally speaking, RAID levels that use striping should have the
largest block size you can make, which for most modern RAID controllers
is 1MB or 2MB.  When you make the stripe size very small, reading and
writing even small files requires hitting all the disks.  With large
stripes, accessing data randomly is more likely to have one read hit one
disk while another read hits another disk.

For Lucene/Solr, there might be benefits to smaller block sizes, but I
believe that they might cause more problems than they solve.

There are some additional things to think about:

If your server has its memory appropriately sized, then you will have
enough RAM to let your operating system cache your index entirely.  For
queries, you will only rarely be hitting the disk ... so disk speed and
layout don't matter much at all, and you will only need to be concerned
about *write* speed for indexing.

RAID levels 3 through 6 (and any derivations like level 50) are
*horrible* if there is very much write activity -- for a Solr install,
that means indexing, and to a slightly lesser extent, logging.

When you write to a RAID5 array, you slow *everything* down.  Even
*reads* that happen at the same time as writes are strongly affected by
those writes.  It is the nature of RAID5.  If your system is entirely
read-only, then RAID5 is awesome ... but RAID10 is better.  RAID10 *is*
initially more expensive than RAID5 ... but the performance and
reliability benefits are completely worth the additional expense.

Additional reading material below.  I do highly recommend reading at
least the first link:

http://www.miracleas.com/BAARF/RAID5_versus_RAID10.txt
http://www.baarf.com/

The RAID10 stripe size should be at least 1MB if your controller
supports blocks that large.

Thanks,
Shawn


Mime
View raw message