mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Schilling <chris.schill...@gmail.com>
Subject Re: sgd.TrainNewsGroups error
Date Wed, 22 Dec 2010 20:17:25 GMT
Ivek,

This is somewhat off-topic.  Have you tried running the ModelDissector to inspect the highest
weighted features in your model trained using the 20 NG data?  I am getting results that do
not make sense (ootb), so it would be interesting to compare to someone else working on the
same problem.  


On Dec 22, 2010, at 12:02 PM, ivek gimmick wrote:

> Ted,
> 
>   Is there a sample program to test the model that we generate using
> TrainNewsGroups.java?
> 
> 
> On Fri, Dec 10, 2010 at 11:50 AM, ivek gimmick <gimmickivek@gmail.com>wrote:
> 
>> Oops. sorry for not posting the stack trace.  And, yeah I know the results
>> will be non-sense, just wanted to get the hang of what is happening with the
>> print statements :)
>> 
>> and here you go!
>> 
>> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>> at java.util.LinkedList.addBefore(LinkedList.java:778)
>> at java.util.LinkedList.add(LinkedList.java:198)
>> at com.google.gson.JsonArray.add(JsonArray.java:51)
>> at
>> org.apache.mahout.classifier.sgd.ModelSerializer$MatrixTypeAdapter.serialize(ModelSerializer.java:223)
>> at
>> org.apache.mahout.classifier.sgd.ModelSerializer$MatrixTypeAdapter.serialize(ModelSerializer.java:212)
>> at
>> com.google.gson.JsonSerializationVisitor.visitFieldUsingCustomHandler(JsonSerializationVisitor.java:148)
>> at
>> com.google.gson.ObjectNavigator.navigateClassFields(ObjectNavigator.java:141)
>> at com.google.gson.ObjectNavigator.accept(ObjectNavigator.java:122)
>> at
>> com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializationContextDefault.java:47)
>> at
>> com.google.gson.DefaultTypeAdapters$CollectionTypeAdapter.serialize(DefaultTypeAdapters.java:445)
>> at
>> com.google.gson.DefaultTypeAdapters$CollectionTypeAdapter.serialize(DefaultTypeAdapters.java:431)
>> at
>> com.google.gson.JsonSerializationVisitor.visitFieldUsingCustomHandler(JsonSerializationVisitor.java:148)
>> at
>> com.google.gson.ObjectNavigator.navigateClassFields(ObjectNavigator.java:141)
>> at com.google.gson.ObjectNavigator.accept(ObjectNavigator.java:122)
>> at
>> com.google.gson.JsonSerializationVisitor.getJsonElementForChild(JsonSerializationVisitor.java:117)
>> at
>> com.google.gson.JsonSerializationVisitor.addAsChildOfObject(JsonSerializationVisitor.java:95)
>> at
>> com.google.gson.JsonSerializationVisitor.visitObjectField(JsonSerializationVisitor.java:90)
>> at
>> com.google.gson.ObjectNavigator.navigateClassFields(ObjectNavigator.java:147)
>> at com.google.gson.ObjectNavigator.accept(ObjectNavigator.java:122)
>> at
>> com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializationContextDefault.java:47)
>> at
>> com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializationContextDefault.java:40)
>> at
>> org.apache.mahout.classifier.sgd.ModelSerializer$StateTypeAdapter.serialize(ModelSerializer.java:335)
>> at
>> org.apache.mahout.classifier.sgd.ModelSerializer$StateTypeAdapter.serialize(ModelSerializer.java:289)
>> at
>> com.google.gson.JsonSerializationVisitor.visitUsingCustomHandler(JsonSerializationVisitor.java:128)
>> at com.google.gson.ObjectNavigator.accept(ObjectNavigator.java:96)
>> at
>> com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializationContextDefault.java:47)
>> at
>> org.apache.mahout.classifier.sgd.ModelSerializer$EvolutionaryProcessTypeAdapter.serialize(ModelSerializer.java:377)
>> at
>> org.apache.mahout.classifier.sgd.ModelSerializer$EvolutionaryProcessTypeAdapter.serialize(ModelSerializer.java:341)
>> at
>> com.google.gson.JsonSerializationVisitor.visitUsingCustomHandler(JsonSerializationVisitor.java:128)
>> at com.google.gson.ObjectNavigator.accept(ObjectNavigator.java:96)
>> at
>> com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializationContextDefault.java:47)
>> at
>> org.apache.mahout.classifier.sgd.ModelSerializer$AdaptiveLogisticRegressionTypeAdapter.serialize(ModelSerializer.java:191)
>> 
>> 
>> On Fri, Dec 10, 2010 at 11:33 AM, Ted Dunning <ted.dunning@gmail.com>wrote:
>> 
>>> Running with only two files (aka two documents) is likely to lead to
>>> nonsense, but shouldn't lead to a crash.
>>> 
>>> On Fri, Dec 10, 2010 at 8:18 AM, ivek gimmick <gimmickivek@gmail.com>
>>> wrote:
>>> 
>>>> I am trying to understand the flow of TrainNewsGroups.java.  To do this,
>>> I
>>>> just used 2 files from TwentyNewsGroups as input files.
>>>> 
>>>> The code runs and prints "exiting main", after which it takes a loooot
>>> of
>>>> time and errors out saying java heap space error.
>>>> 
>>> 
>>> The problem here is twofold:
>>> 
>>> - first, without seeing these errors I am shooting in the dark.  If you
>>> were
>>> include them, I could say more.
>>> 
>>> - second, I used GSON to serialize the model.  Big mistake.  I have since
>>> implemented a bunch of changes to allow SGD models
>>> and all related classes to be considered writables.  I also extended
>>> ModelSerializer to handle that case.  I need to check to see
>>> if I have committed those changes.  That said, you shouldn't have seen
>>> errors or excessive heap space requirements writing the model, just
>>> reading
>>> it back in.
>>> 
>>> It is also possible that since you haven't filled the high level buffer in
>>> the AdaptiveLogisticRegression, the lower level learners may be having
>>> some
>>> problems producing a model since they haven't seen any data yet.
>>> 
>>> Is there a bug somewhere?
>>>> 
>>> 
>>> Well, I consider my use of GSON for a large data structure to be a
>>> mistake.
>>> :-)
>>> 
>> 
>> 


Mime
View raw message