samoa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shekar Tippur <ctip...@gmail.com>
Subject Re: Samoa - Samza job execution
Date Wed, 02 Sep 2015 06:31:43 GMT
Gianmarco,

I really want to take up Samoa supporting json. Can you please point me to
somewhere I can start?

- Shekar

On Sun, Jul 12, 2015 at 12:20 AM, Gianmarco De Francisci Morales <
gdfm@gdfm.me> wrote:

> Hi,
>
> The only reason is that we inherited the format from MOA.
> In practice, anything from which we can create an Instance from would be
> good enough.
> For example I'd like to support VW and svmLib formats.
>
> One caveat is that some algorithms require knowledge of the metadata for
> the datasets to preallocate some data structure.
> I would like to remove this dependency in the future, by having the
> algorithms completely adaptable.
> Though it's not as easy as it sounds :)
>
> Cheers,
>
> --
> Gianmarco
>
> On 11 July 2015 at 16:46, Shekar Tippur <ctippur@gmail.com> wrote:
>
> > Gianmarco
> >
> > Thanks for the response.  Can you please specify the format? Can you
> please
> > explain the reason for keeping it in a specific format?
> > I would like contribute to kafka enhancement. I will look into the code
> > base you pointed out.
> >
> > Shekar
> > On Jul 11, 2015 1:36 AM, "Gianmarco De Francisci Morales" <
> gdfm@apache.org
> > >
> > wrote:
> >
> > > Hi Shekar,
> > >
> > > At the moment we do not support JSON data.
> > > The current readers support ARFF format, which is a CSV with some
> header.
> > > http://www.cs.waikato.ac.nz/ml/weka/arff.html
> > > Adding support for JSON is doable, but it should conform to a very
> > specific
> > > format.
> > >
> > > About Kafka, we support it as a transport via Samza, but we don't have
> a
> > > reader for it right now.
> > > Adding it would be very valuable. If you wanted to work on it I'd be
> > happy
> > > to help.
> > > Have a look at org.apache.samoa.streams.fs.HDFSFileStreamSource,
> > > and org.apache.samoa.streams.ArffFileStream for some examples.
> > >
> > > Cheers,
> > >
> > >
> > > --
> > > Gianmarco
> > >
> > > On 10 July 2015 at 01:18, Shekar Tippur <ctippur@gmail.com> wrote:
> > >
> > > > Hello,
> > > >
> > > > I am trying to use Samoa/Samza combination to apply ML for a dataset
> I
> > > have
> > > > in JSON format.
> > > >
> > > > This is the document I am following:
> > > >
> > > >
> > >
> >
> https://samoa.incubator.apache.org/documentation/Executing-SAMOA-with-Apache-Samza.html
> > > >
> > > > Couple of questions:
> > > > 1. How do I point the input event to a Stream/Topic in Kafka? The
> data
> > is
> > > > in JSON.
> > > > 2. If I want to use historical data that is stored in a file, how do
> I
> > > > point the job to read from a file and serialise as json?
> > > >
> > > > bin/samoa samza target/SAMOA-Samza-0.3.0-SNAPSHOT.jar
> > > > "PrequentialEvaluation -l classifiers.ensemble.Bagging -s (??)"
> > > >
> > > > - Shekar
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message