manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lalit jangra <lalit.j.jan...@gmail.com>
Subject Re: Apache Manifoldcf High Availability requirements
Date Wed, 16 Apr 2014 13:53:08 GMT
Thanks Karl,

I also want to know how to size disks for such setup.  I assume primarily
the disk size will be taken by DB which is PostgreSQL here so what size to
start with and what should be the expansion policy here keeping in mind i
have minimum 10 million documents at start and similar volumes will be
added each year.


On Wed, Apr 16, 2014 at 12:51 PM, Karl Wright <daddywri@gmail.com> wrote:

> Hi Lalit,
>
> ManifoldCF when operating in a clustered scenario will not work with
> separate DB instances, even if they are synched.  You can only operate it
> under conditions where transactional integrity is maintained, which would
> be a single common clustered DB instance.
>
> I'll let others talk to your other points.
>
> (Graeme, are you following this?)
>
> Karl
>
>
>
> On Wed, Apr 16, 2014 at 7:40 AM, lalit jangra <lalit.j.jangra@gmail.com>wrote:
>
>> Hi,
>>
>>
>>
>> I am using MCF for crawling multiple sources having around 10-15 million
>> documents initially & similar volumes added each year and I want it to be
>> clustered in high availability mode. For same, I have some questions in
>> mind.
>>
>> 1.       I am using PostgreSQL DB with tomcat 7 hosting MCF.
>>
>> 2.       How much DB size should be considered for such scenarios as we
>> have documents in magnitude of TBs.
>>
>> 3.       Does PostgreSQL run on VMs.
>>
>> 4.       What would be the ideal clustering approach: having two
>> different MCF servers managed by Zookeeper with each having its own  DB
>> which are in sync with each other  managed by a set of two load
>> balancers or two different MCF instances having a common
>> clustered(active/passive) DB instance managed by set of two load balancers.
>>
>> 5.       If I use first approach : having two different MCF servers
>> managed by Zookeeper with each having its own  DB which are in sync with
>> each other  managed by a set of two load balancers – I need to sync both
>> DB instances having extra tasks added.
>>
>> 6.       If I use second approach : or two different MCF instances
>> having a common clustered(active/passive) DB instance managed by set of two
>> load balancers – I have a set of clustered DBs.
>>
>> 7.       Which of these approaches would yield better results?
>>
>> 8.       Is there any definitive guide for high availability of MCF?
>>
>> Regards,
>>
>> Lalit.
>>
>>
>>
>
>


-- 
Regards,
Lalit Jangra.

Mime
View raw message