samoa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gianmarco De Francisci Morales <>
Subject Re: [SAMOA-49] Integrating Apache Apex with Apache Samoa
Date Wed, 09 Mar 2016 07:14:09 GMT
Hi Bhupesh,

1) A task is a way of coordinating the execution of a learner.
An easy way to do what you ask is to flag your test instances as test only
(isTraining returns false).
Alternatively, writing a new task gives you full control on how to
train/test the learner.

2) Different algorithms have different models.
The model itself might be distributed across nodes.
So far, we don't have a unified API to "access" the model.

The way to go to use the trained model with test instances is simply to
send the test instances to the model via the normal stream, and label them
as test-only instances.

3) In PrequentialEvaluation, they are just sent to the evaluator and then
thrown away.
If you have your own task you could then write them to another sink (e.g.,
Kafka or Redis).
I guess we could also add an option to PrequentialEvaluation to dump the
classified stream of test instances on the file system (local or HDFS).

4) I think Storm has the same requirements for bolts.
Does every serialized class need to have a default constructor?
I don't see a problem with it right now, but I might be overlooking


-- Gianmarco

On Mon, Mar 7, 2016 at 8:25 AM, Bhupesh Chawda <>

> Any comments / replies to the above queries?
> -Thanks.
> Bhupesh
> On Mon, Feb 29, 2016 at 6:56 PM, Bhupesh Chawda <>
> wrote:
> > Dear Community,
> >
> > We have been working on integrating the Apache Apex platform as an SPE
> for
> > Apache Samoa. I understand the integration process and the APIs that are
> > exposed by Apache Samoa for integration with other SPEs. I have a few
> > questions regarding the internals of Samoa which would help me complete
> the
> > integration and ultimately open a PR.
> >
> >    1. I see that all tasks implemented in Samoa are Evaluation tasks -
> >    PrequentialEvaluation or ClusteringEvaluation. What if we need to
> evaluate
> >    actual test instances (for example, instances which are not part of
> the
> >    training set)? Do we need to write another Task to evaluate test
> instances?
> >    2. If yes, then how to we access the trained model. I understand that
> >    streaming algorithms would not produce a one-time trained model. Given
> >    that, there should be some way of identifying the state of the current
> >    model. For example, in the VHT evaluation, we build a decision tree
> over
> >    time using the training instances. Now, if I have some actual test
> >    instances to classify using the VHT, what is the way to do that?
> >    3. How do we see the results of a classification task or a clustering
> >    task. Ideally I would like to see the class labels given to input
> instances
> >    for a classification task or cluster numbers given to input instances
> for a
> >    clustering task. I could not find any option to view such results.
> >    4. Apache Apex uses Kryo for serialization and hence it needs a class
> >    to have a default constructor. I noticed that many of the Samoa
> classes do
> >    not have default constructors and throw an exception when running on
> Apex.
> >    Would it be okay of the Apex integration PR adds these default
> constructors
> >    to the Samoa classes for the purpose of serialization?
> >
> > Thanks in anticipation!
> >
> > -Bhupesh
> > --
> > Regards,
> > Bhupesh Chawda
> >
> --
> Regards,
> Bhupesh Chawda

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message