samoa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gianmarco De Francisci Morales <g...@apache.org>
Subject Re: HoeffdingTree and VerticalHoeffdingTree Classifiers run on KDD Cup 99 Data Set
Date Fri, 11 Sep 2015 06:57:14 GMT
Sure, the ticket is SAMOA-44
<https://issues.apache.org/jira/browse/SAMOA-44>.

Arinto had started the work on model dumping, I don't know what's the
status there.
But it should be straightforward to implement a recursive method.

If you could post the dataset somewhere where it is possible to download
it, it would be great.
If you want to take a stab at debugging what's going on and provide a
patch, it would be even better.

Cheers,

--
Gianmarco

On 10 September 2015 at 08:49, Ercan Öztürk <e.ozturk111@gmail.com> wrote:

> Hi,
>
> Thank you very much for your quick response.
>
> We were using an older version of SAMOA. I've updated the code now (The
> last commit is currently "SAMOA-29: Excluding the samoa-storm.properties at
> compile time and including at test") and after building the code with "mvn
> package" the new command we use to run SAMOA is
>
> local target/SAMOA-Local-0.4.0-incubating-SNAPSHOT.jar
> "PrequentialEvaluation -i -1 -f 41920 -l
> (org.apache.samoa.learners.classifiers.trees.VerticalHoeffdingTree -p 4) -s
> (org.apache.samoa.moa.streams.ArffFileStream -f kddcup_full.arff)"
>
> The console output when the command is run:
>
> bin/samoa
> Deploying to LOCAL
> Command line string =  PrequentialEvaluation -i -1 -f 41920 -l
> (org.apache.samoa.learners.classifiers.trees.VerticalHoeffdingTree -p 4) -s
> (org.apache.samoa.moa.streams.ArffFileStream -f kddcup_full.arff)
> 2015-09-09 15:56:30,036 [main] INFO  org.apache.samoa.LocalDoTask
> (LocalDoTask.java:80) - Successfully instantiating
> org.apache.samoa.tasks.PrequentialEvaluation
> 2015-09-09 15:56:31,221 [main] INFO
>  org.apache.samoa.evaluation.EvaluatorProcessor
> (EvaluatorProcessor.java:83) - 1 seconds for 41920 instances
> 2015-09-09 15:56:31,227 [main] INFO
>  org.apache.samoa.evaluation.EvaluatorProcessor
> (EvaluatorProcessor.java:169) - evaluation instances = 41,920
> classified instances = 41,920
> classifications correct (percent) = 99.988
> Kappa Statistic (percent) = -0.002
> Kappa Temporal Statistic (percent) = 28.571
> Exception in thread "main" java.lang.NullPointerException
> at
> org.apache.samoa.learners.classifiers.trees.ModelAggregatorProcessor.process(ModelAggregatorProcessor.java:142)
> at
> org.apache.samoa.topology.impl.SimpleProcessingItem.processEvent(SimpleProcessingItem.java:84)
> at org.apache.samoa.topology.impl.SimpleStream.put(SimpleStream.java:72)
> at org.apache.samoa.topology.impl.SimpleStream.put(SimpleStream.java:61)
> at
> org.apache.samoa.learners.classifiers.trees.FilterProcessor.process(FilterProcessor.java:93)
> at
> org.apache.samoa.topology.impl.SimpleProcessingItem.processEvent(SimpleProcessingItem.java:84)
> at org.apache.samoa.topology.impl.SimpleStream.put(SimpleStream.java:72)
> at org.apache.samoa.topology.impl.SimpleStream.put(SimpleStream.java:61)
> at
> org.apache.samoa.topology.LocalEntranceProcessingItem.injectNextEvent(LocalEntranceProcessingItem.java:45)
> at
> org.apache.samoa.topology.LocalEntranceProcessingItem.startSendingEvents(LocalEntranceProcessingItem.java:63)
> at
> org.apache.samoa.topology.impl.SimpleTopology.run(SimpleTopology.java:44)
> at
> org.apache.samoa.topology.impl.SimpleEngine.submitTopology(SimpleEngine.java:33)
> at org.apache.samoa.LocalDoTask.main(LocalDoTask.java:88)
>
>
> We would be very appreciated if you could send us the link for the ticket
> so we can follow the updates on the issue.
>
> Yes, we would like to dump the model so that we can see the rules of the
> model and have a better understanding of it.
>
> The method body of describeSubtree() in Node.java is currently empty. Is
> there any work done on it that we can use as a starting point?
>
> If you need the data set to investigate the issue, I can send it via any
> suitable channel, please let me know.
>
> Respectfully,
> Ercan Ozturk
>
> 2015-09-09 15:11 GMT+03:00 Gianmarco De Francisci Morales <gdfm@apache.org
> >:
>
>> Hi,
>>
>> Thanks for reporting the bug.
>> I'm not sure what is causing the issue.
>> Are you using the master version of SAMOA?
>> My line 145 of ModelAggregator is:
>>               this.sendToAttributeStream(abce[i]);
>>
>> From what you say it seems that the problem is a bit above, and leafNode
>> is null.
>> However, by construction there should always be a leaf node.
>>
>> As a workaround your solution is fine, but I guess there is some other
>> underlying problem with the code, which might cause some loss in accuracy.
>> We should investigate this issue further, I'll open a ticket.
>>
>> Regarding fetching the content of the model, we had some prototype model
>> dumper code (Arinto had started it), but I guess it's not working anymore.
>> See the describeSubtree() method in Node.java.
>> So unfortunately you need to do it yourself. However, the good thing is
>> that the tree model is in a single place in ModelAggregator, so it should
>> be relatively easy to walk the tree, starting from the root node.
>> Do you want to dump the model to a text representation for human
>> inspection?
>>
>> Cheers,
>>
>>
>> --
>> Gianmarco
>>
>> On 7 September 2015 at 18:23, Gianmarco De Francisci Morales <
>> gdfm@apache.org> wrote:
>>
>>> Forwarding to the @dev list.
>>> --
>>> Gianmarco
>>>
>>> ---------- Forwarded message ----------
>>> From: Ercan Öztürk <e.ozturk111@gmail.com>
>>> Date: 7 September 2015 at 16:57
>>> Subject: HoeffdingTree and VerticalHoeffdingTree Classifiers run on KDD
>>> Cup 99 Data Set
>>> To: gdfm@apache.org
>>>
>>>
>>> Hi Mr. Morales and Mr. Bifet,
>>>
>>> We are a couple of undergrad students from TOBB University. As a data
>>> mining class project, we decided to run HoeffdingTree classifier-in moa and
>>> VerticalHoeffdingTree classifier-in samoa on KDD Cup 99 data set (couldn't
>>> attach the data set to this mail due to the size limitations of the Apache
>>> mail server) and present the results in our project report.
>>>
>>> We were able to run HoeffdingTree Algorithm on the KDD Cup 99 (both on
>>> kddcup_full.arff, kddcup_10_percent.arff) data set.
>>> VerticalHoeffdingTree classifier also works fine on
>>> kddcup_10_percent.arff. However, when we try to run the
>>> VerticalHoeffdingTree classifier on kddcup_full.arff, we got the
>>> following error:
>>>
>>> The command we use to run SAMOA Local:
>>>
>>> bin/samoa local target/SAMOA-Local-0.3.0-SNAPSHOT.jar
>>> "PrequentialEvaluation -i -1 -f 41920 -l
>>> (com.yahoo.labs.samoa.learners.classifiers.trees.VerticalHoeffdingTree -p
>>> 4) -s (com.yahoo.labs.samoa.moa.streams.ArffFileStream -f kddcup_full.arff)"
>>>
>>> The console output of samoa:
>>>
>>> bin/samoa
>>>
>>> Deploying to LOCAL
>>>
>>> Command line string =  PrequentialEvaluation -i -1 -f 41920 -l
>>> (com.yahoo.labs.samoa.learners.classifiers.trees.VerticalHoeffdingTree -p
>>> 4) -s (com.yahoo.labs.samoa.moa.streams.ArffFileStream -f kddcup_full.arff)
>>>
>>> 2015-09-01 22:22:16,160 [main] INFO  com.yahoo.labs.samoa.LocalDoTask
>>> (LocalDoTask.java:79) - Successfully instantiating
>>> com.yahoo.labs.samoa.tasks.PrequentialEvaluation
>>>
>>> 2015-09-01 22:22:17,741 [main] INFO
>>>  com.yahoo.labs.samoa.evaluation.EvaluatorProcessor
>>> (EvaluatorProcessor.java:86) - 1 seconds for 41920 instances
>>>
>>> 2015-09-01 22:22:17,760 [main] INFO
>>>  com.yahoo.labs.samoa.evaluation.EvaluatorProcessor
>>> (EvaluatorProcessor.java:172) - evaluation instances = 41,920
>>>
>>> classified instances = 41,920
>>>
>>> classifications correct (percent) = 99.988
>>>
>>> Kappa Statistic (percent) = -0.002
>>>
>>> Kappa Temporal Statistic (percent) = 28.571
>>>
>>> Exception in thread "main" java.lang.NullPointerException
>>>
>>> at
>>> com.yahoo.labs.samoa.learners.classifiers.trees.ModelAggregatorProcessor.process(ModelAggregatorProcessor.java:145)
>>>
>>> at
>>> com.yahoo.labs.samoa.topology.impl.SimpleProcessingItem.processEvent(SimpleProcessingItem.java:84)
>>>
>>> at
>>> com.yahoo.labs.samoa.topology.impl.SimpleStream.put(SimpleStream.java:71)
>>>
>>> at
>>> com.yahoo.labs.samoa.topology.impl.SimpleStream.put(SimpleStream.java:60)
>>>
>>> at
>>> com.yahoo.labs.samoa.learners.classifiers.trees.FilterProcessor.process(FilterProcessor.java:95)
>>>
>>> at
>>> com.yahoo.labs.samoa.topology.impl.SimpleProcessingItem.processEvent(SimpleProcessingItem.java:84)
>>>
>>> at
>>> com.yahoo.labs.samoa.topology.impl.SimpleStream.put(SimpleStream.java:71)
>>>
>>> at
>>> com.yahoo.labs.samoa.topology.impl.SimpleStream.put(SimpleStream.java:60)
>>>
>>> at
>>> com.yahoo.labs.samoa.topology.LocalEntranceProcessingItem.injectNextEvent(LocalEntranceProcessingItem.java:46)
>>>
>>> at
>>> com.yahoo.labs.samoa.topology.LocalEntranceProcessingItem.startSendingEvents(LocalEntranceProcessingItem.java:66)
>>>
>>> at
>>> com.yahoo.labs.samoa.topology.impl.SimpleTopology.run(SimpleTopology.java:42)
>>>
>>> at
>>> com.yahoo.labs.samoa.topology.impl.SimpleEngine.submitTopology(SimpleEngine.java:33)
>>>
>>> at com.yahoo.labs.samoa.LocalDoTask.main(LocalDoTask.java:87)
>>>
>>>
>>> We were able to track down the problem to the first instance that causes
>>> it; the instance is on the 76426th line in kddcup_full.arff. The
>>> instance is as follows:
>>>
>>>
>>> 1,tcp,smtp,SF,2252,331,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,7,0,0,0,0,1,0,1,5,216,1,0,0.2,0.01,0,0,0,0,normal
>>>
>>> We haven’t noticed any differences between the problematic instance and
>>> the other instances. Could you lead us to the root of the problem and could
>>> you help us on how to overcome this problem?
>>>
>>> As a workaround we’ve made the following addition to
>>> ModelAggregatorProcessor.java
>>>
>>> if (leafNode == null)
>>>
>>>         return false;
>>>
>>> after the line
>>>
>>> ActiveLearningNode leafNode = (ActiveLearningNode) foundNode.getNode();
>>>
>>> Now, also VeriticalHoeffdingTree Classifier works fine on kddcup_full.arff.
>>> Is this solution acceptable for the problem, what do you think?
>>>
>>>
>>> Besides, we were wondering how we could fetch model contents such as
>>> visiting nodes and node content etc.
>>>
>>> Thanks for your help,
>>>
>>>
>>> Respectfully,
>>>
>>> Ercan Ozturk, Davut Deniz Yavuz, Gozde Boztepe, Sezin Gurkan
>>>
>>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message