From dev-return-705-apmail-samoa-dev-archive=samoa.apache.org@samoa.incubator.apache.org Thu Jul 16 01:31:00 2015 Return-Path: X-Original-To: apmail-samoa-dev-archive@minotaur.apache.org Delivered-To: apmail-samoa-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 678FD186B9 for ; Thu, 16 Jul 2015 01:31:00 +0000 (UTC) Received: (qmail 42289 invoked by uid 500); 16 Jul 2015 01:31:00 -0000 Delivered-To: apmail-samoa-dev-archive@samoa.apache.org Received: (qmail 42243 invoked by uid 500); 16 Jul 2015 01:31:00 -0000 Mailing-List: contact dev-help@samoa.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@samoa.incubator.apache.org Delivered-To: mailing list dev@samoa.incubator.apache.org Received: (qmail 42230 invoked by uid 99); 16 Jul 2015 01:31:00 -0000 Received: from Unknown (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 Jul 2015 01:31:00 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id A6D4A182815 for ; Thu, 16 Jul 2015 01:30:59 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.991 X-Spam-Level: X-Spam-Status: No, score=0.991 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, T_RP_MATCHES_RCVD=-0.01, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id RXpCqXeymX-i for ; Thu, 16 Jul 2015 01:30:44 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with SMTP id 9C4B420F17 for ; Thu, 16 Jul 2015 01:30:43 +0000 (UTC) Received: (qmail 42160 invoked by uid 99); 16 Jul 2015 01:30:42 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 Jul 2015 01:30:42 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 147D9E1785; Thu, 16 Jul 2015 01:30:42 +0000 (UTC) From: karande To: dev@samoa.incubator.apache.org Reply-To: dev@samoa.incubator.apache.org Message-ID: Subject: [GitHub] incubator-samoa pull request: SAMOA-40: Add Kafka stream reader mo... Content-Type: text/plain Date: Thu, 16 Jul 2015 01:30:42 +0000 (UTC) GitHub user karande opened a pull request: https://github.com/apache/incubator-samoa/pull/32 SAMOA-40: Add Kafka stream reader modules to consume data from Kafka … …framework Apache SAMOA is designed to process streaming data and develop streaming machine learning algorithm. Currently, SAMOA framework supports stream data read from Arff files only. Thus, while using SAMOA as a streaming machine learning component in real time use-cases, writing and reading data from files is slow and inefficient. A single Kafka broker can handle hundreds of megabytes of reads and writes per second from thousands of clients. The ability to read data directly from Apache Kafka into SAMOA will not only improve performance but also make SAMOA pluggable to many real time machine learning use cases such as Internet of Things(IoT). GOAL: Add code that enables SAMOA to read data from Apache Kafka as a stream data. Kafka stream reader supports following different options for streaming: a) Topic selection - Kafka topic to read data b) Partition selection - Kafka partition to read data c) Batching - Number of data instances read from Kafka in one read request to Kafka d) Configuration options - Kafka port number, seed information, time delay between two read requests Components: KafkaReader - Consists for APIs to read data from Kafka KafkaStream - Stream source for SAMOA providing data read from Kafka Dependencies for Kafka are added in pom.xml for in samoa-api component. You can merge this pull request into a Git repository by running: $ git pull https://github.com/karande/incubator-samoa master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-samoa/pull/32.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #32 ---- commit 768306b2e832671f37fb0b1d1a009fcb07807ad3 Author: Vishal Karande Date: 2015-07-16T01:13:51Z SAMOA-40: Add Kafka stream reader modules to consume data from Kafka framework Apache SAMOA is designed to process streaming data and develop streaming machine learning algorithm. Currently, SAMOA framework supports stream data read from Arff files only. Thus, while using SAMOA as a streaming machine learning component in real time use-cases, writing and reading data from files is slow and inefficient. A single Kafka broker can handle hundreds of megabytes of reads and writes per second from thousands of clients. The ability to read data directly from Apache Kafka into SAMOA will not only improve performance but also make SAMOA pluggable to many real time machine learning use cases such as Internet of Things(IoT). GOAL: Add code that enables SAMOA to read data from Apache Kafka as a stream data. Kafka stream reader supports following different options for streaming: a) Topic selection - Kafka topic to read data b) Partition selection - Kafka partition to read data c) Batching - Number of data instances read from Kafka in one read request to Kafka d) Configuration options - Kafka port number, seed information, time delay between two read requests Components: KafkaReader - Consists for APIs to read data from Kafka KafkaStream - Stream source for SAMOA providing data read from Kafka Dependencies for Kafka are added in pom.xml for in samoa-api component. ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastructure@apache.org or file a JIRA ticket with INFRA. ---