manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Graeme Seaton <>
Subject Re: Apache Manifoldcf High Availability requirements
Date Wed, 16 Apr 2014 15:34:47 GMT
One  additional clarification - we have switched of Repository History 
tracking within the database - if this is enabled, then depending on the 
frequency of recrawls etc. your database could be significantly larger.



On 16/04/14 16:24, Graeme Seaton wrote:
> Hi Lalit,
> Like all of these things it depends ;-)
>> 1.I am using PostgreSQL DB with tomcat 7 hosting MCF.
> We have the same configuration
>> 2.How much DB size should be considered for such scenarios as we have 
>> documents in magnitude of TBs.
> As an example our test corpus of (currently) 4 million documents about 
> 4GB of PostgresQL when fully vacuumed.  This should only be used as a 
> very rough guide.
>> 3.Does PostgreSQL run on VMs.
> We are running PostgresQL within KVM VM's with a single master 
> replicated to 3 other backup nodes (probably OTT but we are aiming to 
> replicate the configuration of each of the machines in our cluster as 
> much as possible).
>> 4.What would be the ideal clustering approach: having two different 
>> MCF servers managed by Zookeeper with each having its own DB which 
>> are in sync with each other managed by a set of two load balancers or 
>> two different MCF instances having a common clustered(active/passive) 
>> DB instance managed by set of two load balancers.
> We are running ManifoldCF on each of the nodes in the cluster. The 
> Zookeeper locking successfully allows us to crawl from each successfully.
>> 7.Which of these approaches would yield better results?
> IMHO - the biggest limiting factor will be the database but it really 
> depends on your usage.
>> 8.Is there any definitive guide for high availability of MCF?
> Not yet - I'm currently experimenting with various options/approaches 
> at the moment.  HA tends not to lend itself to a One-size-fits-all 
> approach - at some point I'm sure there will be a 'Best Practices' 
> guide.  Feel free to keep asking questions in the interim.
> Regards,
> Graeme

View raw message