samoa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gianmarco De Francisci Morales <g...@apache.org>
Subject Fwd: HoeffdingTree and VerticalHoeffdingTree Classifiers run on KDD Cup 99 Data Set
Date Mon, 07 Sep 2015 15:23:40 GMT
Forwarding to the @dev list.
--
Gianmarco

---------- Forwarded message ----------
From: Ercan Öztürk <e.ozturk111@gmail.com>
Date: 7 September 2015 at 16:57
Subject: HoeffdingTree and VerticalHoeffdingTree Classifiers run on KDD Cup
99 Data Set
To: gdfm@apache.org


Hi Mr. Morales and Mr. Bifet,

We are a couple of undergrad students from TOBB University. As a data
mining class project, we decided to run HoeffdingTree classifier-in moa and
VerticalHoeffdingTree classifier-in samoa on KDD Cup 99 data set (couldn't
attach the data set to this mail due to the size limitations of the Apache
mail server) and present the results in our project report.

We were able to run HoeffdingTree Algorithm on the KDD Cup 99 (both on
kddcup_full.arff, kddcup_10_percent.arff) data set. VerticalHoeffdingTree
classifier also works fine on kddcup_10_percent.arff. However, when we try
to run the VerticalHoeffdingTree classifier on kddcup_full.arff, we got the
following error:

The command we use to run SAMOA Local:

bin/samoa local target/SAMOA-Local-0.3.0-SNAPSHOT.jar
"PrequentialEvaluation -i -1 -f 41920 -l
(com.yahoo.labs.samoa.learners.classifiers.trees.VerticalHoeffdingTree -p
4) -s (com.yahoo.labs.samoa.moa.streams.ArffFileStream -f kddcup_full.arff)"

The console output of samoa:

bin/samoa

Deploying to LOCAL

Command line string =  PrequentialEvaluation -i -1 -f 41920 -l
(com.yahoo.labs.samoa.learners.classifiers.trees.VerticalHoeffdingTree -p
4) -s (com.yahoo.labs.samoa.moa.streams.ArffFileStream -f kddcup_full.arff)

2015-09-01 22:22:16,160 [main] INFO  com.yahoo.labs.samoa.LocalDoTask
(LocalDoTask.java:79) - Successfully instantiating
com.yahoo.labs.samoa.tasks.PrequentialEvaluation

2015-09-01 22:22:17,741 [main] INFO
 com.yahoo.labs.samoa.evaluation.EvaluatorProcessor
(EvaluatorProcessor.java:86) - 1 seconds for 41920 instances

2015-09-01 22:22:17,760 [main] INFO
 com.yahoo.labs.samoa.evaluation.EvaluatorProcessor
(EvaluatorProcessor.java:172) - evaluation instances = 41,920

classified instances = 41,920

classifications correct (percent) = 99.988

Kappa Statistic (percent) = -0.002

Kappa Temporal Statistic (percent) = 28.571

Exception in thread "main" java.lang.NullPointerException

at
com.yahoo.labs.samoa.learners.classifiers.trees.ModelAggregatorProcessor.process(ModelAggregatorProcessor.java:145)

at
com.yahoo.labs.samoa.topology.impl.SimpleProcessingItem.processEvent(SimpleProcessingItem.java:84)

at com.yahoo.labs.samoa.topology.impl.SimpleStream.put(SimpleStream.java:71)

at com.yahoo.labs.samoa.topology.impl.SimpleStream.put(SimpleStream.java:60)

at
com.yahoo.labs.samoa.learners.classifiers.trees.FilterProcessor.process(FilterProcessor.java:95)

at
com.yahoo.labs.samoa.topology.impl.SimpleProcessingItem.processEvent(SimpleProcessingItem.java:84)

at com.yahoo.labs.samoa.topology.impl.SimpleStream.put(SimpleStream.java:71)

at com.yahoo.labs.samoa.topology.impl.SimpleStream.put(SimpleStream.java:60)

at
com.yahoo.labs.samoa.topology.LocalEntranceProcessingItem.injectNextEvent(LocalEntranceProcessingItem.java:46)

at
com.yahoo.labs.samoa.topology.LocalEntranceProcessingItem.startSendingEvents(LocalEntranceProcessingItem.java:66)

at
com.yahoo.labs.samoa.topology.impl.SimpleTopology.run(SimpleTopology.java:42)

at
com.yahoo.labs.samoa.topology.impl.SimpleEngine.submitTopology(SimpleEngine.java:33)

at com.yahoo.labs.samoa.LocalDoTask.main(LocalDoTask.java:87)


We were able to track down the problem to the first instance that causes
it; the instance is on the 76426th line in kddcup_full.arff. The instance
is as follows:

1,tcp,smtp,SF,2252,331,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,7,0,0,0,0,1,0,1,5,216,1,0,0.2,0.01,0,0,0,0,normal

We haven’t noticed any differences between the problematic instance and the
other instances. Could you lead us to the root of the problem and could you
help us on how to overcome this problem?

As a workaround we’ve made the following addition to
ModelAggregatorProcessor.java

if (leafNode == null)

        return false;

after the line

ActiveLearningNode leafNode = (ActiveLearningNode) foundNode.getNode();

Now, also VeriticalHoeffdingTree Classifier works fine on kddcup_full.arff. Is
this solution acceptable for the problem, what do you think?


Besides, we were wondering how we could fetch model contents such as
visiting nodes and node content etc.

Thanks for your help,


Respectfully,

Ercan Ozturk, Davut Deniz Yavuz, Gozde Boztepe, Sezin Gurkan

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message