[ https://issues.apache.org/jira/browse/SAMOA-40?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14630221#comment-14630221 ] Sriharsha Chintalapani commented on SAMOA-40: --------------------------------------------- [~azaroth] can you please take a look at this patch. [~karande] you might want to amend the git commit to remove the description and put that description in the JIRA it self. > Add Kafka stream reader modules to consume data from Kafka framework > -------------------------------------------------------------------- > > Key: SAMOA-40 > URL: https://issues.apache.org/jira/browse/SAMOA-40 > Project: SAMOA > Issue Type: Task > Components: Infrastructure, SAMOA-API > Environment: OS X Version 10.10.3 > Reporter: Vishal Karande > Priority: Minor > Labels: features > Original Estimate: 168h > Remaining Estimate: 168h > > Apache SAMOA is designed to process streaming data and develop streaming machine learning > algorithm. Currently, SAMOA framework supports stream data read from Arff files only. > Thus, while using SAMOA as a streaming machine learning component in real time use-cases, > writing and reading data from files is slow and inefficient. > A single Kafka broker can handle hundreds of megabytes of reads and writes per second > from thousands of clients. The ability to read data directly from Apache Kafka into SAMOA will > not only improve performance but also make SAMOA pluggable to many real time machine > learning use cases such as Internet of Things(IoT). > GOAL: > Add code that enables SAMOA to read data from Apache Kafka as a stream data. > Kafka stream reader supports following different options for streaming: > a) Topic selection - Kafka topic to read data > b) Partition selection - Kafka partition to read data > c) Batching - Number of data instances read from Kafka in one read request to Kafka > d) Configuration options - Kafka port number, seed information, time delay between two read requests > Components: > KafkaReader - Consists for APIs to read data from Kafka > KafkaStream - Stream source for SAMOA providing data read from Kafka > Dependencies for Kafka are added in pom.xml for in samoa-api component. -- This message was sent by Atlassian JIRA (v6.3.4#6332)