spot-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pierre-Luc Dion <pdion...@apache.org>
Subject Re: Configure Spot-ingest
Date Mon, 19 Nov 2018 18:42:02 GMT
That's a good start thanks!

I've been continuing on understanding spot_ingest, and it need some
clarification on how it work I think in the documentation.  I'm under the
impression the ingest_worker is listening to kafka for a filename+path to
process (ie: *.pcap) than would process file that are already in hadoop? ,
is that correct?

Is there a way to do something where kafka would ingest raw data, send by
any kind of device or application, and worker process them if needed and
store them into hive ? unless spot_ingest is responsible on collecting data
from network and app.

At the moment I'm struggling on how to push data into hive.

Thanks!

On Mon, Nov 19, 2018 at 12:24 PM Nate Smith <nathanael@apache.org> wrote:

> Hello,
>
> Later today i'll try to setup an example with some notes when i have the
> time.
> Is there a particular field you are unsure of?
>
> there are some links to synthetic data you can test ingestion with,
> https://issues.apache.org/jira/browse/SPOT-135
>
> NOTE: some of the links are currently down
>
> here's a link to some flow data:
>
> https://s3-us-west-2.amazonaws.com/apachespot/public_data_sets/syed/syed_syn_flow.tgz
>
> - Nathanael
> On Sun, Nov 18, 2018 at 5:05 AM Pierre-Luc Dion <pdion891@apache.org>
> wrote:
>
>> Hi,
>>
>> I'm new to Spot and I'm setting up a dev environment while learning
>> Hadoop stuff. I've been thru the Spot documentation and I'm at the point
>> now where I think I can start ingesting data. But the documentation is not
>> clear about ingestion, I'm not sure how I can sent data from tshark to,
>> where ?  any thing special to start spot-ingest ?
>>
>> the pipelines section of ingest_conf.json is not clear on how it should
>> look like, is it possible to have an example of how it look like ?
>>
>> Since I'm not even sure about the state of my install, is there some data
>> I can import because the link in [1]  for a file on S3 [2] get a permission
>> denied.
>>
>> Thanks!
>>
>>
>> [1]
>> https://github.com/apache/incubator-spot/blob/master/spot-ml/DATA_SAMPLE.md
>> [2]
>> https://s3-us-west-2.amazonaws.com/apachespot/public_data_sets/dns_labeled_data/20170509_parquet.tar.gz
>>
>

Mime
View raw message