spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <>
Subject Re: Near Real time analytics with Spark and tokenization
Date Sun, 15 Oct 2017 09:12:17 GMT
Can’t you cache the token vault in a caching solution , such as Ignite? The lookup of single
tokens would be really fast.
About what volumes one talks about? 

I assume you refer to PCI DSS, so security might be an important aspect which might be not
that easy to achieve with vault-less tokenization. Then, with vault-less tokenization you
need to recalculate all tokens  in case the secret is compromised.
There might be other compliance requirements , which may need to be weighted by the users.

> On 15. Oct 2017, at 09:15, Mich Talebzadeh <> wrote:
> Hi,
> When doing micro-batch streaming of trade data we need to tokenization certain columns
before data lands in Hbase with Lambda architecture.
> There are two ways of tokenizing data, vault based and vault less using something like
Protegrity tokenization.
> The vault-based tokenization requires clear text and token values to be stored in a vault
say Hbase and crucially the vault cannot be on the same Hadoop cluster that we are processing
real time. It could be in another Hadoop cluster for tokenization.
> This causes latency for real time analytics when token values have to be calculated and
then stored in remote Hbase vault.
> What is the general approach to this type of issue. It seems to be based to use vault-less
> Thanks
> Dr Mich Talebzadeh
> LinkedIn
> Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage
or destruction of data or any other property which may arise from relying on this email's
technical content is explicitly disclaimed. The author will in no case be liable for any monetary
damages arising from such loss, damage or destruction.

View raw message