spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <jornfra...@gmail.com>
Subject Re: Possible DR solution
Date Sat, 12 Nov 2016 11:17:28 GMT
What is wrong with the good old batch transfer for transferring data from a
cluster to another? I assume your use case is only business continuity in
case of disasters such as data center loss, which are unlikely to happen
(well it does not mean they do not happen) and where you could afford to
loose one day (or hour) of data (depends!).

Nevertheless, I assume he refers to the Hadoop storage policies:
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html
, but this still only works for the same cluster.

You could also develop a custom secondary file system, similar to the
Ignite Cache filesystem, that sits on top of HDFS and as soon as it
receives data it sends them to another cluster and provides it to HDFS. Not
knowing Wandisco, I assume what it does. Given the prices (and the fact
that clusters tend to grow) you may want to evaluate if buying or making
makes sense. In any case, it also requires evaluation of network
throughput, because this may become the bottleneck somewhere (either within
the cluster or more likely between data centers).

As you mentioned, Hbase & Co may require a special consideration for the
case that data is in-memory and not yet persisted.

On Sat, Nov 12, 2016 at 12:04 PM, Mich Talebzadeh <mich.talebzadeh@gmail.com
> wrote:

> thanks Vince
>
> can you provide more details on this pls
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 12 November 2016 at 09:52, vincent gromakowski <
> vincent.gromakowski@gmail.com> wrote:
>
>> A Hdfs tiering policy with good tags should be similar
>>
>> Le 11 nov. 2016 11:19 PM, "Mich Talebzadeh" <mich.talebzadeh@gmail.com>
>> a écrit :
>>
>>> I really don't see why one wants to set up streaming replication unless
>>> for situations where similar functionality to transactional databases is
>>> required in big data?
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>> On 11 November 2016 at 17:24, Mich Talebzadeh <mich.talebzadeh@gmail.com
>>> > wrote:
>>>
>>>> I think it differs as it starts streaming data through its own port as
>>>> soon as the first block is landed. so the granularity is a block.
>>>>
>>>> however, think of it as oracle golden gate replication or sap
>>>> replication for databases. the only difference is that if the corruption
in
>>>> the block with hdfs it will be freplicated much like srdf.
>>>>
>>>> whereas with oracle or sap it is log based replication which stops when
>>>> it encounters corruption.
>>>>
>>>> replication depends on the block. so can replicate hive metadata and
>>>> fsimage etc. but cannot replicate hbase memstore if hbase crashes.
>>>>
>>>> so that is the gist of it. streaming replication as opposed to
>>>> snapshot.
>>>>
>>>> sounds familiar. think of it as log shipping in oracle old days versus
>>>> goldengate etc.
>>>>
>>>> hth
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>> any loss, damage or destruction of data or any other property which may
>>>> arise from relying on this email's technical content is explicitly
>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>> arising from such loss, damage or destruction.
>>>>
>>>>
>>>>
>>>> On 11 November 2016 at 17:14, Deepak Sharma <deepakmca05@gmail.com>
>>>> wrote:
>>>>
>>>>> Reason being you can set up hdfs duplication on your own to some other
>>>>> cluster .
>>>>>
>>>>> On Nov 11, 2016 22:42, "Mich Talebzadeh" <mich.talebzadeh@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> reason being ?
>>>>>>
>>>>>> Dr Mich Talebzadeh
>>>>>>
>>>>>>
>>>>>>
>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>
>>>>>>
>>>>>>
>>>>>> http://talebzadehmich.wordpress.com
>>>>>>
>>>>>>
>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>> for any loss, damage or destruction of data or any other property
which may
>>>>>> arise from relying on this email's technical content is explicitly
>>>>>> disclaimed. The author will in no case be liable for any monetary
damages
>>>>>> arising from such loss, damage or destruction.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 11 November 2016 at 17:11, Deepak Sharma <deepakmca05@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> This is waste of money I guess.
>>>>>>>
>>>>>>> On Nov 11, 2016 22:41, "Mich Talebzadeh" <mich.talebzadeh@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> starts at $4,000 per node per year all inclusive.
>>>>>>>>
>>>>>>>> With discount it can be halved but we are talking a node
itself so
>>>>>>>> if you have 5 nodes in primary and 5 nodes in DR we are talking
about $40K
>>>>>>>> already.
>>>>>>>>
>>>>>>>> HTH
>>>>>>>>
>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>
>>>>>>>>
>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>>>> for any loss, damage or destruction of data or any other
property which may
>>>>>>>> arise from relying on this email's technical content is explicitly
>>>>>>>> disclaimed. The author will in no case be liable for any
monetary damages
>>>>>>>> arising from such loss, damage or destruction.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 11 November 2016 at 16:43, Mudit Kumar <mkumar128@sapient.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Is it feasible cost wise?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Mudit
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *From:* Mich Talebzadeh [mailto:mich.talebzadeh@gmail.com]
>>>>>>>>> *Sent:* Friday, November 11, 2016 2:56 PM
>>>>>>>>> *To:* user @spark
>>>>>>>>> *Subject:* Possible DR solution
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Has anyone had experience of using WanDisco
>>>>>>>>> <https://www.wandisco.com/> block replication to
create a fault
>>>>>>>>> tolerant solution to DR in Hadoop?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The product claims that it starts replicating as soon
as the first
>>>>>>>>> data block lands on HDFS and takes the block and sends
it to DR/replicate
>>>>>>>>> site. The idea is that is faster than doing it through
traditional HDFS
>>>>>>>>> copy tools which are normally batch oriented.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> It also claims to replicate Hive metadata as well.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I wanted to gauge if anyone has used it or a competitor
product.
>>>>>>>>> The claim is that they do not have competitors!
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> LinkedIn  *https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>>>>> for any loss, damage or destruction of data or any other
property which may
>>>>>>>>> arise from relying on this email's technical content
is explicitly
>>>>>>>>> disclaimed. The author will in no case be liable for
any monetary damages
>>>>>>>>> arising from such loss, damage or destruction.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>>
>

Mime
View raw message