manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Graeme Seaton <graeme.sea...@exonar.com>
Subject Re: Apache Manifoldcf High Availability requirements
Date Wed, 16 Apr 2014 15:24:45 GMT
Hi Lalit,

Like all of these things it depends ;-)
>
> 1.I am using PostgreSQL DB with tomcat 7 hosting MCF.
>
We have the same configuration
>
> 2.How much DB size should be considered for such scenarios as we have 
> documents in magnitude of TBs.
>
As an example our test corpus of (currently) 4 million documents about 
4GB of PostgresQL when fully vacuumed.  This should only be used as a 
very rough guide.
>
> 3.Does PostgreSQL run on VMs.
>
We are running PostgresQL within KVM VM's with a single master 
replicated to 3 other backup nodes (probably OTT but we are aiming to 
replicate the configuration of each of the machines in our cluster as 
much as possible).
>
> 4.What would be the ideal clustering approach: having two different 
> MCF servers managed by Zookeeper with each having its own DB which are 
> in sync with each other managed by a set of two load balancers or two 
> different MCF instances having a common clustered(active/passive) DB 
> instance managed by set of two load balancers.
>
We are running ManifoldCF on each of the nodes in the cluster.  The 
Zookeeper locking successfully allows us to crawl from each successfully.
>
> 7.Which of these approaches would yield better results?
>
IMHO - the biggest limiting factor will be the database but it really 
depends on your usage.
>
> 8.Is there any definitive guide for high availability of MCF?
>
Not yet - I'm currently experimenting with various options/approaches at 
the moment.  HA tends not to lend itself to a One-size-fits-all approach 
- at some point I'm sure there will be a 'Best Practices' guide.  Feel 
free to keep asking questions in the interim.

Regards,

Graeme



Mime
View raw message