samoa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gianmarco De Francisci Morales <g...@apache.org>
Subject Re: HoeffdingTree and VerticalHoeffdingTree Classifiers run on KDD Cup 99 Data Set
Date Wed, 09 Sep 2015 12:11:08 GMT
Hi,

Thanks for reporting the bug.
I'm not sure what is causing the issue.
Are you using the master version of SAMOA?
My line 145 of ModelAggregator is:
              this.sendToAttributeStream(abce[i]);

>From what you say it seems that the problem is a bit above, and leafNode is
null.
However, by construction there should always be a leaf node.

As a workaround your solution is fine, but I guess there is some other
underlying problem with the code, which might cause some loss in accuracy.
We should investigate this issue further, I'll open a ticket.

Regarding fetching the content of the model, we had some prototype model
dumper code (Arinto had started it), but I guess it's not working anymore.
See the describeSubtree() method in Node.java.
So unfortunately you need to do it yourself. However, the good thing is
that the tree model is in a single place in ModelAggregator, so it should
be relatively easy to walk the tree, starting from the root node.
Do you want to dump the model to a text representation for human inspection?

Cheers,


--
Gianmarco

On 7 September 2015 at 18:23, Gianmarco De Francisci Morales <
gdfm@apache.org> wrote:

> Forwarding to the @dev list.
> --
> Gianmarco
>
> ---------- Forwarded message ----------
> From: Ercan Öztürk <e.ozturk111@gmail.com>
> Date: 7 September 2015 at 16:57
> Subject: HoeffdingTree and VerticalHoeffdingTree Classifiers run on KDD
> Cup 99 Data Set
> To: gdfm@apache.org
>
>
> Hi Mr. Morales and Mr. Bifet,
>
> We are a couple of undergrad students from TOBB University. As a data
> mining class project, we decided to run HoeffdingTree classifier-in moa and
> VerticalHoeffdingTree classifier-in samoa on KDD Cup 99 data set (couldn't
> attach the data set to this mail due to the size limitations of the Apache
> mail server) and present the results in our project report.
>
> We were able to run HoeffdingTree Algorithm on the KDD Cup 99 (both on
> kddcup_full.arff, kddcup_10_percent.arff) data set. VerticalHoeffdingTree
> classifier also works fine on kddcup_10_percent.arff. However, when we
> try to run the VerticalHoeffdingTree classifier on kddcup_full.arff, we
> got the following error:
>
> The command we use to run SAMOA Local:
>
> bin/samoa local target/SAMOA-Local-0.3.0-SNAPSHOT.jar
> "PrequentialEvaluation -i -1 -f 41920 -l
> (com.yahoo.labs.samoa.learners.classifiers.trees.VerticalHoeffdingTree -p
> 4) -s (com.yahoo.labs.samoa.moa.streams.ArffFileStream -f kddcup_full.arff)"
>
> The console output of samoa:
>
> bin/samoa
>
> Deploying to LOCAL
>
> Command line string =  PrequentialEvaluation -i -1 -f 41920 -l
> (com.yahoo.labs.samoa.learners.classifiers.trees.VerticalHoeffdingTree -p
> 4) -s (com.yahoo.labs.samoa.moa.streams.ArffFileStream -f kddcup_full.arff)
>
> 2015-09-01 22:22:16,160 [main] INFO  com.yahoo.labs.samoa.LocalDoTask
> (LocalDoTask.java:79) - Successfully instantiating
> com.yahoo.labs.samoa.tasks.PrequentialEvaluation
>
> 2015-09-01 22:22:17,741 [main] INFO
>  com.yahoo.labs.samoa.evaluation.EvaluatorProcessor
> (EvaluatorProcessor.java:86) - 1 seconds for 41920 instances
>
> 2015-09-01 22:22:17,760 [main] INFO
>  com.yahoo.labs.samoa.evaluation.EvaluatorProcessor
> (EvaluatorProcessor.java:172) - evaluation instances = 41,920
>
> classified instances = 41,920
>
> classifications correct (percent) = 99.988
>
> Kappa Statistic (percent) = -0.002
>
> Kappa Temporal Statistic (percent) = 28.571
>
> Exception in thread "main" java.lang.NullPointerException
>
> at
> com.yahoo.labs.samoa.learners.classifiers.trees.ModelAggregatorProcessor.process(ModelAggregatorProcessor.java:145)
>
> at
> com.yahoo.labs.samoa.topology.impl.SimpleProcessingItem.processEvent(SimpleProcessingItem.java:84)
>
> at
> com.yahoo.labs.samoa.topology.impl.SimpleStream.put(SimpleStream.java:71)
>
> at
> com.yahoo.labs.samoa.topology.impl.SimpleStream.put(SimpleStream.java:60)
>
> at
> com.yahoo.labs.samoa.learners.classifiers.trees.FilterProcessor.process(FilterProcessor.java:95)
>
> at
> com.yahoo.labs.samoa.topology.impl.SimpleProcessingItem.processEvent(SimpleProcessingItem.java:84)
>
> at
> com.yahoo.labs.samoa.topology.impl.SimpleStream.put(SimpleStream.java:71)
>
> at
> com.yahoo.labs.samoa.topology.impl.SimpleStream.put(SimpleStream.java:60)
>
> at
> com.yahoo.labs.samoa.topology.LocalEntranceProcessingItem.injectNextEvent(LocalEntranceProcessingItem.java:46)
>
> at
> com.yahoo.labs.samoa.topology.LocalEntranceProcessingItem.startSendingEvents(LocalEntranceProcessingItem.java:66)
>
> at
> com.yahoo.labs.samoa.topology.impl.SimpleTopology.run(SimpleTopology.java:42)
>
> at
> com.yahoo.labs.samoa.topology.impl.SimpleEngine.submitTopology(SimpleEngine.java:33)
>
> at com.yahoo.labs.samoa.LocalDoTask.main(LocalDoTask.java:87)
>
>
> We were able to track down the problem to the first instance that causes
> it; the instance is on the 76426th line in kddcup_full.arff. The instance
> is as follows:
>
>
> 1,tcp,smtp,SF,2252,331,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,7,0,0,0,0,1,0,1,5,216,1,0,0.2,0.01,0,0,0,0,normal
>
> We haven’t noticed any differences between the problematic instance and
> the other instances. Could you lead us to the root of the problem and could
> you help us on how to overcome this problem?
>
> As a workaround we’ve made the following addition to
> ModelAggregatorProcessor.java
>
> if (leafNode == null)
>
>         return false;
>
> after the line
>
> ActiveLearningNode leafNode = (ActiveLearningNode) foundNode.getNode();
>
> Now, also VeriticalHoeffdingTree Classifier works fine on kddcup_full.arff.
> Is this solution acceptable for the problem, what do you think?
>
>
> Besides, we were wondering how we could fetch model contents such as
> visiting nodes and node content etc.
>
> Thanks for your help,
>
>
> Respectfully,
>
> Ercan Ozturk, Davut Deniz Yavuz, Gozde Boztepe, Sezin Gurkan
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message