samoa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shigeru Imai <im...@rpi.edu>
Subject Scalability of Vertical Hoeffding Tree
Date Mon, 01 May 2017 20:35:06 GMT
Hello,

I am testing the scalability of Vertical Hoeffding Tree on SAMOA-Storm consuming streams from
Kafka. So far, I have tested up to 32 VMs of m4.large on Amazon EC2; however, throughput does
not improve almost at all. Storm consumes streams at 30 Mbytes/sec from Kafka with 1 VM, and
this throughput stays almost the same up to 32 VMs.

Here are the experimental settings:
* SAMOA: latest on github as of April 2017
* Storm: version 0.10.1
* Dataset: forest covertype (54 attributes, https://archive.ics.uci.edu/ml/datasets/Covertype)
* Kafka connector: implementation proposed for SAMOA-40 (https://github.com/apache/incubator-samoa/pull/32)
* Scaling policy: assign one core per LocalStatisticsProcessor
* Tested with Prequential Evaluation

I read the Vertical Hoeffding Tree paper from IEEE BigData 2016, but I could not find the
information on how throughput of VHT scales when we add more resources (it only shows relative
performance improvements compared to the standard Hoeffding tree).

Has anyone scale VHT successfully with or without Kafka?  Is there any tips to achieve high
throughput with VHT?
I believe using datasets with more attributes leads to a better scalability for VHT, so I
am thinking to try that next, but I think 54 attributes should scale at least a little bit.

Also, I found the following sleep of 1 second in StormEntranceProcessingItem.java. It looks
to me that this hinders high throughput processing. Can we get rid of this sleep?
    public void nextTuple() {
      if (entranceProcessor.hasNext()) {
        Values value = newValues(entranceProcessor.nextEvent());
        collector.emit(outputStream.getOutputId(), value);
      } else
        Utils.sleep(1000);
      // StormTupleInfo tupleInfo = tupleInfoQueue.poll(50,
      // TimeUnit.MILLISECONDS);
      // if (tupleInfo != null) {
      // Values value = newValues(tupleInfo.getContentEvent());
      // collector.emit(tupleInfo.getStormStream().getOutputId(), value);
      // }
    }

Any suggestions would be appreciated.

Thank you,
Shigeru

-- 
Shigeru Imai  <imais@rpi.edu>
Ph.D. candidate
Worldwide Computing Laboratory
Department of Computer Science
Rensselaer Polytechnic Institute
110 8th Street, Troy, NY 12180, USA
http://wcl.cs.rpi.edu/

Mime
View raw message