incubator-s4-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthieu Morel <mmo...@apache.org>
Subject Re: checkpoint problem
Date Tue, 26 Mar 2013 08:16:48 GMT
Thanks for the feedback.

When you write "cannot pass" what do you mean? the exception that you reported is logged and
the program continues? something else?

Besides, the standard tests that we run for the release pass and show that checkpointing works.
The problem is might be related to the speed of checkpointing and of sending events. Note
that it might not be necessary to checkpoint for every single event, and checkpointing every
n events (n relatively small) and losing at worst n-1 events per PE in case of failure might
be ok. 

It would be good to know in which conditions exactly you encounter the issue, i.e. frequency
of checkpointing and frequency of events sent/received. Reporting a bug on our jira system
would be the best place to follow that conversation.

Thanks and regards,

Matthieu 




On Mar 26, 2013, at 08:56 , Dingyu Yang wrote:

> Hi,Matthieu
> I debug the program and still have this problem. 
> I find the problem when debuging at: SaveStateTask.run-----futureSerializedState.get(1000,
TimeUnit.MILLISECONDS).
> It cannot pass at here. I don't know what the problem is, Even I have just one PE instance.
 Is it my program problem or S4?
> Are you able to checkpoint?
> 
> Waiting for your answer!
> 
> 
> 2013/3/26 Matthieu Morel <mmorel@apache.org>
> This looks like a bug, from a race condition in the serializer.
> 
> Can you file a bug? Also, are you able to reproduce it systematically?
> 
> Thanks,
> 
> Matthieu
> 
> On Mar 23, 2013, at 07:33 , Dingyu Yang wrote:
> 
> > Hi,all
> > I run a checkpoint example and get some problems.
> > The version is S4 0.6 RC3 .
> > ./s4 deploy -a=example.wordcountApp -c=testCluster1 -appName=wordApp -p=s4.checkpointing.filesystem.storageRootPath=/home/tmp/s4checkpoint
-emc=org.apache.s4.core.ft.FileSystemBackendCheckpointingModule
> >
> > Then I get this error:
> > 14:21:50.251 [Checkpointing-storage-0] WARN  org.apache.s4.core.ft.SaveStateTask
- Cannot save checkpoint : [PROTO_ID];[KEY] --> [example.WordSumPE];[./s4]
> > java.util.concurrent.ExecutionException: com.esotericsoftware.kryo.KryoException:
java.util.ConcurrentModificationException
> > Serialization trace:
> > classes (sun.misc.Launcher$AppClassLoader)
> > contextClassLoader (java.lang.Thread)
> > thread (java.util.concurrent.ThreadPoolExecutor$Worker)
> > workers (java.util.concurrent.ThreadPoolExecutor)
> > fetchingThreadPool (org.apache.s4.core.ft.SafeKeeper)
> > checkpointingFramework (example.wordcountApp)
> > app (org.apache.s4.core.Stream)
> > downStream (example.WordSumPE)
> >     at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:232) ~[na:1.6.0_22]
> >     at java.util.concurrent.FutureTask.get(FutureTask.java:91) ~[na:1.6.0_22]
> >     at org.apache.s4.core.ft.SaveStateTask.run(SaveStateTask.java:66) ~[bin/:na]
> >     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
[na:1.6.0_22]
> >     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
[na:1.6.0_22]
> >     at java.lang.Thread.run(Thread.java:662) [na:1.6.0_22]
> > Caused by: com.esotericsoftware.kryo.KryoException: java.util.ConcurrentModificationException
> > Serialization trace:
> > classes (sun.misc.Launcher$AppClassLoader)
> > contextClassLoader (java.lang.Thread)
> > thread (java.util.concurrent.ThreadPoolExecutor$Worker)
> > workers (java.util.concurrent.ThreadPoolExecutor)
> > fetchingThreadPool (org.apache.s4.core.ft.SafeKeeper)
> > checkpointingFramework (example.wordcountApp)
> > app (org.apache.s4.core.Stream)
> > downStream (example.WordSumPE)
> >     at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:585)
~[kryo-2.20.jar:na]
> >     at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213)
~[kryo-2.20.jar:na]
> >     at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:504) ~[kryo-2.20.jar:na]
> >     at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:564)
~[kryo-2.20.jar:na]
> >     at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213)
~[kryo-2.20.jar:na]
> >     at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:504) ~[kryo-2.20.jar:na]
> >     at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:564)
~[kryo-2.20.jar:na]
> >     at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213)
~[kryo-2.20.jar:na]
> >     at com.esotericsoftware.kryo.Kryo.writeObjectOrNull(Kryo.java:552) ~[kryo-2.20.jar:na]
> >     at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:68)
~[kryo-2.20.jar:na]
> >     at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:18)
~[kryo-2.20.jar:na]
> >     at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:504) ~[kryo-2.20.jar:na]
> >     at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:564)
~[kryo-2.20.jar:na]
> >     at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213)
~[kryo-2.20.jar:na]
> >     at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:504) ~[kryo-2.20.jar:na]
> >     at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:564)
~[kryo-2.20.jar:na]
> >     at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213)
~[kryo-2.20.jar:na]
> >     at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:504) ~[kryo-2.20.jar:na]
> >     at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:564)
~[kryo-2.20.jar:na]
> >     at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213)
~[kryo-2.20.jar:na]
> >     at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:504) ~[kryo-2.20.jar:na]
> >     at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:564)
~[kryo-2.20.jar:na]
> >     at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213)
~[kryo-2.20.jar:na]
> >     at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:504) ~[kryo-2.20.jar:na]
> >     at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:564)
~[kryo-2.20.jar:na]
> >     at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213)
~[kryo-2.20.jar:na]
> >     at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:571) ~[kryo-2.20.jar:na]
> >     at org.apache.s4.comm.serialize.KryoSerDeser.serialize(KryoSerDeser.java:91)
~[bin/:na]
> >     at org.apache.s4.core.ProcessingElement.serializeState(ProcessingElement.java:802)
~[bin/:na]
> >     at org.apache.s4.core.ft.SerializeTask.call(SerializeTask.java:42) ~[bin/:na]
> >     at org.apache.s4.core.ft.SerializeTask.call(SerializeTask.java:1) ~[bin/:na]
> >     at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) ~[na:1.6.0_22]
> >     at java.util.concurrent.FutureTask.run(FutureTask.java:138) ~[na:1.6.0_22]
> >     ... 3 common frames omitted
> > Caused by: java.util.ConcurrentModificationException: null
> >     at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
~[na:1.6.0_22]
> >     at java.util.AbstractList$Itr.next(AbstractList.java:343) ~[na:1.6.0_22]
> >     at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:74)
~[kryo-2.20.jar:na]
> >     at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:18)
~[kryo-2.20.jar:na]
> >     at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:504) ~[kryo-2.20.jar:na]
> >     at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:564)
~[kryo-2.20.jar:na]
> >     ... 35 common frames omitted
> >
> >
> 
> 


Mime
View raw message