samoa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gianmarco De Francisci Morales <g...@gdfm.me>
Subject Re: Samoa - Samza job execution
Date Sun, 12 Jul 2015 07:20:35 GMT
Hi,

The only reason is that we inherited the format from MOA.
In practice, anything from which we can create an Instance from would be
good enough.
For example I'd like to support VW and svmLib formats.

One caveat is that some algorithms require knowledge of the metadata for
the datasets to preallocate some data structure.
I would like to remove this dependency in the future, by having the
algorithms completely adaptable.
Though it's not as easy as it sounds :)

Cheers,

--
Gianmarco

On 11 July 2015 at 16:46, Shekar Tippur <ctippur@gmail.com> wrote:

> Gianmarco
>
> Thanks for the response.  Can you please specify the format? Can you please
> explain the reason for keeping it in a specific format?
> I would like contribute to kafka enhancement. I will look into the code
> base you pointed out.
>
> Shekar
> On Jul 11, 2015 1:36 AM, "Gianmarco De Francisci Morales" <gdfm@apache.org
> >
> wrote:
>
> > Hi Shekar,
> >
> > At the moment we do not support JSON data.
> > The current readers support ARFF format, which is a CSV with some header.
> > http://www.cs.waikato.ac.nz/ml/weka/arff.html
> > Adding support for JSON is doable, but it should conform to a very
> specific
> > format.
> >
> > About Kafka, we support it as a transport via Samza, but we don't have a
> > reader for it right now.
> > Adding it would be very valuable. If you wanted to work on it I'd be
> happy
> > to help.
> > Have a look at org.apache.samoa.streams.fs.HDFSFileStreamSource,
> > and org.apache.samoa.streams.ArffFileStream for some examples.
> >
> > Cheers,
> >
> >
> > --
> > Gianmarco
> >
> > On 10 July 2015 at 01:18, Shekar Tippur <ctippur@gmail.com> wrote:
> >
> > > Hello,
> > >
> > > I am trying to use Samoa/Samza combination to apply ML for a dataset I
> > have
> > > in JSON format.
> > >
> > > This is the document I am following:
> > >
> > >
> >
> https://samoa.incubator.apache.org/documentation/Executing-SAMOA-with-Apache-Samza.html
> > >
> > > Couple of questions:
> > > 1. How do I point the input event to a Stream/Topic in Kafka? The data
> is
> > > in JSON.
> > > 2. If I want to use historical data that is stored in a file, how do I
> > > point the job to read from a file and serialise as json?
> > >
> > > bin/samoa samza target/SAMOA-Samza-0.3.0-SNAPSHOT.jar
> > > "PrequentialEvaluation -l classifiers.ensemble.Bagging -s (??)"
> > >
> > > - Shekar
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message