samoa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <>
Subject [jira] [Commented] (SAMOA-40) Add Kafka stream reader modules to consume data from Kafka framework
Date Wed, 09 Sep 2015 11:52:45 GMT


ASF GitHub Bot commented on SAMOA-40:

Github user gdfm commented on the pull request:
    @abifet could you have a look at the patch?
    I think we are getting there, but there are still a few things to fix.
    I'd like to get your opinion first.

> Add Kafka stream reader modules to consume data from Kafka framework
> --------------------------------------------------------------------
>                 Key: SAMOA-40
>                 URL:
>             Project: SAMOA
>          Issue Type: Task
>          Components: Infrastructure, SAMOA-API
>         Environment: OS X Version 10.10.3
>            Reporter: Vishal Karande
>            Priority: Minor
>              Labels: features
>   Original Estimate: 168h
>  Remaining Estimate: 168h
> Apache SAMOA is designed to process streaming data and develop streaming machine learning
> algorithm. Currently, SAMOA framework supports stream data read from Arff files only.
> Thus, while using SAMOA as a streaming machine learning component in real time use-cases,
> writing and reading data from files is slow and inefficient.
> A single Kafka broker can handle hundreds of megabytes of reads and writes per second

> from thousands of clients. The ability to read data directly from Apache Kafka into SAMOA
> not only improve performance but also make SAMOA pluggable to many real time machine
> learning use cases such as Internet of Things(IoT).
> Add code that enables SAMOA to read data from Apache Kafka as a stream data.
> Kafka stream reader supports following different options for streaming:
> a) Topic selection - Kafka topic to read data
> b) Partition selection - Kafka partition to read data
> c) Batching - Number of data instances read from Kafka in one read request to Kafka
> d) Configuration options - Kafka port number, seed information, time delay between two
read requests
> Components:
> KafkaReader - Consists for APIs to read data from Kafka
> KafkaStream - Stream source for SAMOA providing data read from Kafka
> Dependencies for Kafka are added in pom.xml for in samoa-api component. 

This message was sent by Atlassian JIRA

View raw message