metron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Carolyn Duby <>
Subject Re: HCP in Cloud infrastructures such as AWS , GCP, AZURE
Date Mon, 22 Oct 2018 14:53:07 GMT

Hive 3.0 works well with block stores.  You can either add it to your Metron cluster or spin
up an ephemeral cluster with Cloudbreak:

1. Metron streams into HDFS in JSON.
2. Compact daily with Spark into ORC format and store in block store (S3, ADLS, etc).
3. Query ORC in block store using external Hive 3.0 tables in HDP 3 using LLAP.
4. If querying externally from block store is too slow, try adding more LLAP cache or load
data into HDFS prior to analysis.

If you are using the Metron Alerts UI, you will need solr which works well only on fast disk.
  To keep costs down, reduce the context stored in Solr using the following techniques:
1. Only index the fields you might search on.
2. Reduce the formats you store in Solr to only those you will want to see in the Alerts UI.
3. Reduce the length of time you store data in Solr.

Carolyn Duby
Solutions Engineer, Northeast

Join my team!
Enterprise Account Manager – Boston -
Solutions Engineer – Boston -
Need Answers? Try <>

On 10/19/18, 7:18 AM, "deepak kumar" <> wrote:

>Hi All
>I have a quick question around HCP deployments in cloud infra such as AWS.
>I am planning to run persistent cluster for all event streaming and
>And then run transient cluster such as AWS EMR to run batch loads on the
>data ingested from persistent cluster.
>Have anyone tried this model ?
>Since data volume is going to be humongous ,cloud is charging lot of money
>for data io and storage.
>Keeping this in mind , what could be the best cloud deployment of hcp
>components assuming there is going to be ingest rate of 10TB per day .
>Thanks in advance.
View raw message