GitHub user karande opened a pull request:
https://github.com/apache/incubator-samoa/pull/32
SAMOA-40: Add Kafka stream reader modules to consume data from Kafka …
…framework
Apache SAMOA is designed to process streaming data and develop
streaming machine learning
algorithm. Currently, SAMOA framework supports stream data read from
Arff files only.
Thus, while using SAMOA as a streaming machine learning component in
real time use-cases,
writing and reading data from files is slow and inefficient.
A single Kafka broker can handle hundreds of megabytes of reads and
writes per second
from thousands of clients. The ability to read data directly from
Apache Kafka into SAMOA will
not only improve performance but also make SAMOA pluggable to many real
time machine
learning use cases such as Internet of Things(IoT).
GOAL:
Add code that enables SAMOA to read data from Apache Kafka as a stream
data.
Kafka stream reader supports following different options for streaming:
a) Topic selection - Kafka topic to read data
b) Partition selection - Kafka partition to read data
c) Batching - Number of data instances read from Kafka in one read
request to Kafka
d) Configuration options - Kafka port number, seed information, time
delay between two read requests
Components:
KafkaReader - Consists for APIs to read data from Kafka
KafkaStream - Stream source for SAMOA providing data read from Kafka
Dependencies for Kafka are added in pom.xml for in samoa-api component.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/karande/incubator-samoa master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-samoa/pull/32.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #32
----
commit 768306b2e832671f37fb0b1d1a009fcb07807ad3
Author: Vishal Karande <vishalmkarande@gmail.com>
Date: 2015-07-16T01:13:51Z
SAMOA-40: Add Kafka stream reader modules to consume data from Kafka framework
Apache SAMOA is designed to process streaming data and develop
streaming machine learning
algorithm. Currently, SAMOA framework supports stream data read from
Arff files only.
Thus, while using SAMOA as a streaming machine learning component in
real time use-cases,
writing and reading data from files is slow and inefficient.
A single Kafka broker can handle hundreds of megabytes of reads and
writes per second
from thousands of clients. The ability to read data directly from
Apache Kafka into SAMOA will
not only improve performance but also make SAMOA pluggable to many real
time machine
learning use cases such as Internet of Things(IoT).
GOAL:
Add code that enables SAMOA to read data from Apache Kafka as a stream
data.
Kafka stream reader supports following different options for streaming:
a) Topic selection - Kafka topic to read data
b) Partition selection - Kafka partition to read data
c) Batching - Number of data instances read from Kafka in one read
request to Kafka
d) Configuration options - Kafka port number, seed information, time
delay between two read requests
Components:
KafkaReader - Consists for APIs to read data from Kafka
KafkaStream - Stream source for SAMOA providing data read from Kafka
Dependencies for Kafka are added in pom.xml for in samoa-api component.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
|