samoa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SAMOA-59) Add an Adapter for Apache Gearpump
Date Wed, 26 Oct 2016 09:45:59 GMT

    [ https://issues.apache.org/jira/browse/SAMOA-59?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15608004#comment-15608004
] 

ASF GitHub Bot commented on SAMOA-59:
-------------------------------------

Github user nicolas-kourtellis commented on the issue:

    https://github.com/apache/incubator-samoa/pull/54
  
    Hi @manuzhang,
    
    I managed to get the adapter working. Here are some notes that I would ask you take into
consideration:
    - There are some inherent difficulties compiling gearpump from source. It would be good
to have a compiled version to use directly.
    - Assuming this is given (which was my case because @manuzhang provided a compiled version),
I managed to get samoa to compile/package with gearpump and run the package.
    - However, it would be good for the adapter to be upgraded to the new version of samoa
in incubation, which is 0.5.0. But it should be fairly straightforward. This will allow us
to test it with some more generators and ML methods added in the recent past.
    
    - Feedback when executing VHT:
    => The engine seems to continue executing the topology long after it has been created,
used for the task and finished. Is there any way to pass a signal at the end of the execution
to shut it down? (note: not the engine itself, but the topology). It was occupying resources
on my computer for no reason at full CPU consumption. I found a manual way to kill it using
the command "gear kill -appid X" with X being the id of the task, but I wonder if there is
a more automatic way.
    => After I killed the jobs manually, the java processes that were created for the execution
(I will assume they are the containers of the topologies) were still alive, just not consuming
much resources. Shouldn't they have been terminated and removed? Is there a way to do that?
    => When I run new tasks, they just keep getting added on the engine (which is logical),
even though I had killed the other ones earlier.
    => Multiple executions of the same experiment with the same seed for the random generator
using the parameter -r which should yield the same random tree, perform differently with respect
to accuracy. Is that expected?
    => Using a different seed for the random tree generator (r=1,...,5), the performance
of the execution of VHT on local GearPump is fairly low (average over 5 different seeds: 65.39%
accuracy) in comparison to running the topology on local Storm (84.046% accuracy). Any explanation
why so much reduction in performance?



> Add an Adapter for Apache Gearpump
> ----------------------------------
>
>                 Key: SAMOA-59
>                 URL: https://issues.apache.org/jira/browse/SAMOA-59
>             Project: SAMOA
>          Issue Type: New Feature
>            Reporter: Yu Gong
>
> Gearpump (http://www.gearpump.io/) is a real-time big data streaming engine. It is inspired
by recent advances in the Akka framework and a desire to improve on existing streaming frameworks.
Gearpump is event/message based and featured as low latency handling, high performance, exactly
once semantics, dynamic topology update, Apache Storm compatibility, etc. It now becomes Apache
Incubator project (http://incubator.apache.org/projects/gearpump.html). A Gearpump adapter
for SAMOA will translate the Apache SAMOA topologies into Gearpump DAGs in order to run them
on the Gearpump platform. So users can run streaming machine learning algorithms built by
Apache SAMOA on Gearpump.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message