lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joaquim Pedro Carvalho de Oliveira <>
Subject dataDir path duplication after recovery on HDFS+S3
Date Wed, 01 Aug 2018 18:10:15 GMT
Hello all, 

We're running Solr 7.3.1 on Docker, trying to save the indexing information on Ceph Storage
using HDFS + Hadoop-AWS S3A filesystem client. Currently, we start 2 Solr instances and 3

When Solr is started, we create a test collection with 2 shards and a replication factor of
2. Everything works fine and the Ceph Buckets are populated correctly. We can see files in
Ceph like:

    testcollection/core_node8/data/index/_0.fdt 111 2018-08-01T14:45:18.038Z 
    testcollection/core_node8/data/index/_0.fdx 83 2018-08-01T14:45:16.604Z 
    testcollection/core_node8/data/index/_0.fnm 427 2018-08-01T14:45:22.738Z 

However, when we restart one of the containers, the recovery process apparently duplicates
the "dataDir" configuration, and we start to see additional files like: 

111 2018-08-01T14:54:39.361Z 
83 2018-08-01T14:54:32.669Z 
427 2018-08-01T14:54:58.761Z 

Where "s3a:/bucketname" is the "solr.hdfs.home" value configured in

We also noticed that before the restart, the file does not have the "dataDir"
property configured. After the restart, the container has this property defined as "s3a:/bucketname/testcollection/core_node8/data".

Is this behaviour correct, even if the index files are being duplicated again and again in
every restart? What could be causing this?

Thanks for your help,

Joaquim Oliveira


"Esta mensagem do SERVIÇO FEDERAL DE PROCESSAMENTO DE DADOS (SERPRO), empresa pública federal
regida pelo disposto na Lei Federal nº 5.615, é enviada exclusivamente a seu destinatário
e pode conter informações confidenciais, protegidas por sigilo profissional. Sua utilização
desautorizada é ilegal e sujeita o infrator às penas da lei. Se você a recebeu indevidamente,
queira, por gentileza, reenviá-la ao emitente, esclarecendo o equívoco."

"This message from SERVIÇO FEDERAL DE PROCESSAMENTO DE DADOS (SERPRO) -- a government company
established under Brazilian law (5.615/70) -- is directed exclusively to its addressee and
may contain confidential data, protected under professional secrecy rules. Its unauthorized
use is illegal and may subject the transgressor to the law's penalties. If you're not the
addressee, please send it back, elucidating the failure."

View raw message