mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ivek gimmick <gimmicki...@gmail.com>
Subject Re: sgd.TrainNewsGroups error
Date Fri, 10 Dec 2010 16:50:54 GMT
Oops. sorry for not posting the stack trace.  And, yeah I know the results
will be non-sense, just wanted to get the hang of what is happening with the
print statements :)

and here you go!

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.LinkedList.addBefore(LinkedList.java:778)
at java.util.LinkedList.add(LinkedList.java:198)
at com.google.gson.JsonArray.add(JsonArray.java:51)
at
org.apache.mahout.classifier.sgd.ModelSerializer$MatrixTypeAdapter.serialize(ModelSerializer.java:223)
at
org.apache.mahout.classifier.sgd.ModelSerializer$MatrixTypeAdapter.serialize(ModelSerializer.java:212)
at
com.google.gson.JsonSerializationVisitor.visitFieldUsingCustomHandler(JsonSerializationVisitor.java:148)
at
com.google.gson.ObjectNavigator.navigateClassFields(ObjectNavigator.java:141)
at com.google.gson.ObjectNavigator.accept(ObjectNavigator.java:122)
at
com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializationContextDefault.java:47)
at
com.google.gson.DefaultTypeAdapters$CollectionTypeAdapter.serialize(DefaultTypeAdapters.java:445)
at
com.google.gson.DefaultTypeAdapters$CollectionTypeAdapter.serialize(DefaultTypeAdapters.java:431)
at
com.google.gson.JsonSerializationVisitor.visitFieldUsingCustomHandler(JsonSerializationVisitor.java:148)
at
com.google.gson.ObjectNavigator.navigateClassFields(ObjectNavigator.java:141)
at com.google.gson.ObjectNavigator.accept(ObjectNavigator.java:122)
at
com.google.gson.JsonSerializationVisitor.getJsonElementForChild(JsonSerializationVisitor.java:117)
at
com.google.gson.JsonSerializationVisitor.addAsChildOfObject(JsonSerializationVisitor.java:95)
at
com.google.gson.JsonSerializationVisitor.visitObjectField(JsonSerializationVisitor.java:90)
at
com.google.gson.ObjectNavigator.navigateClassFields(ObjectNavigator.java:147)
at com.google.gson.ObjectNavigator.accept(ObjectNavigator.java:122)
at
com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializationContextDefault.java:47)
at
com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializationContextDefault.java:40)
at
org.apache.mahout.classifier.sgd.ModelSerializer$StateTypeAdapter.serialize(ModelSerializer.java:335)
at
org.apache.mahout.classifier.sgd.ModelSerializer$StateTypeAdapter.serialize(ModelSerializer.java:289)
at
com.google.gson.JsonSerializationVisitor.visitUsingCustomHandler(JsonSerializationVisitor.java:128)
at com.google.gson.ObjectNavigator.accept(ObjectNavigator.java:96)
at
com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializationContextDefault.java:47)
at
org.apache.mahout.classifier.sgd.ModelSerializer$EvolutionaryProcessTypeAdapter.serialize(ModelSerializer.java:377)
at
org.apache.mahout.classifier.sgd.ModelSerializer$EvolutionaryProcessTypeAdapter.serialize(ModelSerializer.java:341)
at
com.google.gson.JsonSerializationVisitor.visitUsingCustomHandler(JsonSerializationVisitor.java:128)
at com.google.gson.ObjectNavigator.accept(ObjectNavigator.java:96)
at
com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializationContextDefault.java:47)
at
org.apache.mahout.classifier.sgd.ModelSerializer$AdaptiveLogisticRegressionTypeAdapter.serialize(ModelSerializer.java:191)


On Fri, Dec 10, 2010 at 11:33 AM, Ted Dunning <ted.dunning@gmail.com> wrote:

> Running with only two files (aka two documents) is likely to lead to
> nonsense, but shouldn't lead to a crash.
>
> On Fri, Dec 10, 2010 at 8:18 AM, ivek gimmick <gimmickivek@gmail.com>
> wrote:
>
> > I am trying to understand the flow of TrainNewsGroups.java.  To do this,
> I
> > just used 2 files from TwentyNewsGroups as input files.
> >
> > The code runs and prints "exiting main", after which it takes a loooot of
> > time and errors out saying java heap space error.
> >
>
> The problem here is twofold:
>
> - first, without seeing these errors I am shooting in the dark.  If you
> were
> include them, I could say more.
>
> - second, I used GSON to serialize the model.  Big mistake.  I have since
> implemented a bunch of changes to allow SGD models
> and all related classes to be considered writables.  I also extended
> ModelSerializer to handle that case.  I need to check to see
> if I have committed those changes.  That said, you shouldn't have seen
> errors or excessive heap space requirements writing the model, just reading
> it back in.
>
> It is also possible that since you haven't filled the high level buffer in
> the AdaptiveLogisticRegression, the lower level learners may be having some
> problems producing a model since they haven't seen any data yet.
>
> Is there a bug somewhere?
> >
>
> Well, I consider my use of GSON for a large data structure to be a mistake.
>  :-)
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message