samoa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From senorcarbone <...@git.apache.org>
Subject [GitHub] incubator-samoa pull request: SAMOA-16: Add an adapter for Apache ...
Date Sat, 09 May 2015 13:23:12 GMT
Github user senorcarbone commented on the pull request:

    https://github.com/apache/incubator-samoa/pull/11#issuecomment-100484501
  
    Hello again @gdfm and @abifet ,
    I did a lot of cross-profiling between storm and flink, running the same `VerticalHoeffdingTree`
task under different configurations during the last two days and I think the results are quite
interesting. 
    
    It looks like the algorithm performance (and accuracy) depends heavily on the ingestion
speed of the local statistics processors. The paradox is that the greater the speed the slower
the whole computation gets by time  since more and more attribute events are sent to the local
statistics processors with higher rate, the more updates the model aggregator gets back. 
    
    The average processing delay (in num of flatten instances processed by the aggregator
between sending a process event and receiving the respective local statistics) is ~2k instances
for Flink and around 400k instances for Storm. Also in Storm the aggregator continuously broadcasts
~100-200 attribute messages to local processors on average while Flink broadcasts ~2100 attribute
messages due to the rate it gets results back I assume. These are collected locally on each
component and there was no message duplication. 
    Since you worked on the algorithm, do you find this behavior reasonable?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message