samoa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gianmarco De Francisci Morales <g...@apache.org>
Subject Re: Samoa - Samza job execution
Date Sat, 11 Jul 2015 08:35:42 GMT
Hi Shekar,

At the moment we do not support JSON data.
The current readers support ARFF format, which is a CSV with some header.
http://www.cs.waikato.ac.nz/ml/weka/arff.html
Adding support for JSON is doable, but it should conform to a very specific
format.

About Kafka, we support it as a transport via Samza, but we don't have a
reader for it right now.
Adding it would be very valuable. If you wanted to work on it I'd be happy
to help.
Have a look at org.apache.samoa.streams.fs.HDFSFileStreamSource,
and org.apache.samoa.streams.ArffFileStream for some examples.

Cheers,


--
Gianmarco

On 10 July 2015 at 01:18, Shekar Tippur <ctippur@gmail.com> wrote:

> Hello,
>
> I am trying to use Samoa/Samza combination to apply ML for a dataset I have
> in JSON format.
>
> This is the document I am following:
>
> https://samoa.incubator.apache.org/documentation/Executing-SAMOA-with-Apache-Samza.html
>
> Couple of questions:
> 1. How do I point the input event to a Stream/Topic in Kafka? The data is
> in JSON.
> 2. If I want to use historical data that is stored in a file, how do I
> point the job to read from a file and serialise as json?
>
> bin/samoa samza target/SAMOA-Samza-0.3.0-SNAPSHOT.jar
> "PrequentialEvaluation -l classifiers.ensemble.Bagging -s (??)"
>
> - Shekar
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message