lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hendrik Haddorp <>
Subject Re: Solr on HDFS vs local storage - Benchmarking
Date Wed, 22 Nov 2017 19:31:01 GMT
We actually use no auto warming. Our collections are pretty small and 
the query performance is not really a problem so far. We are using lots 
of collections and most Solr caches seem to be per core and not global 
so we also have a problem with caching. I have to test the HDFS cache 
some more as that should work cross collections.

We also had an HDFS setup already so it looked like a good option to not 
loos data. Earlier we had a few cases where we lost the machines so HDFS 
looked safer for that.

I would expect that the HDFS performance is also quite good if you have 
lots of document adds and not so frequent commits. Frequent adds with 
commits, which is likely not good in general anyway, does look quite a 
bit slower then local storage so far. As we didn't see that in our 
earlier tests, which were more, query focused, I said it large depends 
on what you are doing.


On 22.11.2017 18:41, Erick Erickson wrote:
> In my experience, for relatively static indexes the performance is
> roughly similar. Once the data is read from whatever data source it's
> in memory, where the data came from is (largely) secondary in
> importance.
> In cases where there's a lot of I/O I expect HDFS to be slower, this
> fits Hendrik's observation: "We now had a patter with lots of small
> updates and commits and that seems to be quite a bit slower". He's
> merging segments and (presumably) autowarming frequently, implying
> lots of I/O and HDFS adds an extra layer.
> Personally I'd use whichever is most convenient and see if the
> performance was "good enough". I wouldn't recommend _installing_ HDFS
> just to use it with Solr, why add another complication? If you need
> the redundancy add replicas. If you already have the HDFS
> infrastructure in place and using HDFS is easier than local storage,
> feel free....
> Best,
> Erick
> On Wed, Nov 22, 2017 at 8:06 AM, Greenhorn Techie
> <> wrote:
>> Hendrik,
>> Thanks for your response.
>> Regarding "But this seems to greatly depend on how your setup looks like
>> and what actions you perform." May I know what are the factors influence
>> and what considerations are to be taken in relation to this?
>> Thanks
>> On Wed, 22 Nov 2017 at 14:16 Hendrik Haddorp <>
>> wrote:
>>> We did some testing and the performance was strangely even better with
>>> HDFS then the with the local file system. But this seems to greatly
>>> depend on how your setup looks like and what actions you perform. We now
>>> had a patter with lots of small updates and commits and that seems to be
>>> quite a bit slower. We are about to do performance testing on that now.
>>> The reason we switched to HDFS was largely connected to us using Docker
>>> and Marathon/Mesos. With HDFS the data is in a shared file system and
>>> thus it is possible to move the replica to a different instance on a a
>>> different host.
>>> regards,
>>> Hendrik
>>> On 22.11.2017 14:59, Greenhorn Techie wrote:
>>>> Hi,
>>>> Good Afternoon!!
>>>> While the discussion around issues related to "Solr on HDFS" is live, I
>>>> would like to understand if anyone has done any performance benchmarking
>>>> for both Solr indexing and search between HDFS vs local file system.
>>>> Also, from experience, what would the community folks suggest? Solr on
>>>> local file system or Solr on HDFS? Has anyone done a comparative study of
>>>> these choices?
>>>> Thanks

View raw message