lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Solr on HDFS vs local storage - Benchmarking
Date Wed, 22 Nov 2017 17:41:47 GMT
In my experience, for relatively static indexes the performance is
roughly similar. Once the data is read from whatever data source it's
in memory, where the data came from is (largely) secondary in
importance.

In cases where there's a lot of I/O I expect HDFS to be slower, this
fits Hendrik's observation: "We now had a patter with lots of small
updates and commits and that seems to be quite a bit slower". He's
merging segments and (presumably) autowarming frequently, implying
lots of I/O and HDFS adds an extra layer.

Personally I'd use whichever is most convenient and see if the
performance was "good enough". I wouldn't recommend _installing_ HDFS
just to use it with Solr, why add another complication? If you need
the redundancy add replicas. If you already have the HDFS
infrastructure in place and using HDFS is easier than local storage,
feel free....

Best,
Erick


On Wed, Nov 22, 2017 at 8:06 AM, Greenhorn Techie
<greenhorntechie@gmail.com> wrote:
> Hendrik,
>
> Thanks for your response.
>
> Regarding "But this seems to greatly depend on how your setup looks like
> and what actions you perform." May I know what are the factors influence
> and what considerations are to be taken in relation to this?
>
> Thanks
>
> On Wed, 22 Nov 2017 at 14:16 Hendrik Haddorp <hendrik.haddorp@gmx.net>
> wrote:
>
>> We did some testing and the performance was strangely even better with
>> HDFS then the with the local file system. But this seems to greatly
>> depend on how your setup looks like and what actions you perform. We now
>> had a patter with lots of small updates and commits and that seems to be
>> quite a bit slower. We are about to do performance testing on that now.
>>
>> The reason we switched to HDFS was largely connected to us using Docker
>> and Marathon/Mesos. With HDFS the data is in a shared file system and
>> thus it is possible to move the replica to a different instance on a a
>> different host.
>>
>> regards,
>> Hendrik
>>
>> On 22.11.2017 14:59, Greenhorn Techie wrote:
>> > Hi,
>> >
>> > Good Afternoon!!
>> >
>> > While the discussion around issues related to "Solr on HDFS" is live, I
>> > would like to understand if anyone has done any performance benchmarking
>> > for both Solr indexing and search between HDFS vs local file system.
>> >
>> > Also, from experience, what would the community folks suggest? Solr on
>> > local file system or Solr on HDFS? Has anyone done a comparative study of
>> > these choices?
>> >
>> > Thanks
>> >
>>
>>

Mime
View raw message