incubator-s4-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dingyu Yang <yangdin...@gmail.com>
Subject Re: checkpoint problem
Date Tue, 26 Mar 2013 08:43:45 GMT
OK, Thanks.
I am very glad to contribute to S4.

Dingyu

2013/3/26 Matthieu Morel <mmorel@apache.org>

> Reporting issues is through the jira bugtracking system here
> https://issues.apache.org/jira/browse/S4
>
> You have to create an account - no special permissions needed, if I
> remember well - then file a ticket for the S4 project.
>
> That's a great way to start contributing to the project!
>
> Thanks,
>
> Matthieu
>
> On Mar 26, 2013, at 09:28 , Dingyu Yang wrote:
>
> Yes, when I run at -futureSerializedState.get(1000,
> TimeUnit.MILLISECONDS), then I get the  error previous mentioned.
> My program sets the frequency setting as follows:
>                 wordSumPE.setCheckpointingConfig(new
> CheckpointingConfig.Builder(CheckpointingMode.TIME).frequency(20).timeUnit(TimeUnit.SECONDS).build());
>
> The sending in adapter is very easy and just some test words(10 words).
> So I think the problem is at futureSerializedState class.
> I am not familiar with jira system. Or Can I join the contribution of S4?
> Thank you !
> dingyu
>
>
> 2013/3/26 Matthieu Morel <mmorel@apache.org>
>
>> Thanks for the feedback.
>>
>> When you write "cannot pass" what do you mean? the exception that you
>> reported is logged and the program continues? something else?
>>
>> Besides, the standard tests that we run for the release pass and show
>> that checkpointing works. The problem is might be related to the speed of
>> checkpointing and of sending events. Note that it might not be necessary to
>> checkpoint for every single event, and checkpointing every n events (n
>> relatively small) and losing at worst n-1 events per PE in case of failure
>> might be ok.
>>
>> It would be good to know in which conditions exactly you encounter the
>> issue, i.e. frequency of checkpointing and frequency of events
>> sent/received. Reporting a bug on our jira system would be the best place
>> to follow that conversation.
>>
>> Thanks and regards,
>>
>> Matthieu
>>
>>
>>
>>
>> On Mar 26, 2013, at 08:56 , Dingyu Yang wrote:
>>
>> Hi,Matthieu
>> I debug the program and still have this problem.
>> I find the problem when debuging at:
>> SaveStateTask.run-----futureSerializedState.get(1000,
>> TimeUnit.MILLISECONDS).
>> It cannot pass at here. I don't know what the problem is, Even I have
>> just one PE instance.  Is it my program problem or S4?
>> Are you able to checkpoint?
>>
>> Waiting for your answer!
>>
>>
>> 2013/3/26 Matthieu Morel <mmorel@apache.org>
>>
>>> This looks like a bug, from a race condition in the serializer.
>>>
>>> Can you file a bug? Also, are you able to reproduce it systematically?
>>>
>>> Thanks,
>>>
>>> Matthieu
>>>
>>> On Mar 23, 2013, at 07:33 , Dingyu Yang wrote:
>>>
>>> > Hi,all
>>> > I run a checkpoint example and get some problems.
>>> > The version is S4 0.6 RC3 .
>>> > ./s4 deploy -a=example.wordcountApp -c=testCluster1 -appName=wordApp
>>> -p=s4.checkpointing.filesystem.storageRootPath=/home/tmp/s4checkpoint
>>> -emc=org.apache.s4.core.ft.FileSystemBackendCheckpointingModule
>>> >
>>> > Then I get this error:
>>> > 14:21:50.251 [Checkpointing-storage-0] WARN
>>>  org.apache.s4.core.ft.SaveStateTask - Cannot save checkpoint :
>>> [PROTO_ID];[KEY] --> [example.WordSumPE];[./s4]
>>> > java.util.concurrent.ExecutionException:
>>> com.esotericsoftware.kryo.KryoException:
>>> java.util.ConcurrentModificationException
>>> > Serialization trace:
>>> > classes (sun.misc.Launcher$AppClassLoader)
>>> > contextClassLoader (java.lang.Thread)
>>> > thread (java.util.concurrent.ThreadPoolExecutor$Worker)
>>> > workers (java.util.concurrent.ThreadPoolExecutor)
>>> > fetchingThreadPool (org.apache.s4.core.ft.SafeKeeper)
>>> > checkpointingFramework (example.wordcountApp)
>>> > app (org.apache.s4.core.Stream)
>>> > downStream (example.WordSumPE)
>>> >     at
>>> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:232)
>>> ~[na:1.6.0_22]
>>> >     at java.util.concurrent.FutureTask.get(FutureTask.java:91)
>>> ~[na:1.6.0_22]
>>> >     at org.apache.s4.core.ft.SaveStateTask.run(SaveStateTask.java:66)
>>> ~[bin/:na]
>>> >     at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>> [na:1.6.0_22]
>>> >     at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>> [na:1.6.0_22]
>>> >     at java.lang.Thread.run(Thread.java:662) [na:1.6.0_22]
>>> > Caused by: com.esotericsoftware.kryo.KryoException:
>>> java.util.ConcurrentModificationException
>>> > Serialization trace:
>>> > classes (sun.misc.Launcher$AppClassLoader)
>>> > contextClassLoader (java.lang.Thread)
>>> > thread (java.util.concurrent.ThreadPoolExecutor$Worker)
>>> > workers (java.util.concurrent.ThreadPoolExecutor)
>>> > fetchingThreadPool (org.apache.s4.core.ft.SafeKeeper)
>>> > checkpointingFramework (example.wordcountApp)
>>> > app (org.apache.s4.core.Stream)
>>> > downStream (example.WordSumPE)
>>> >     at
>>> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:585)
>>> ~[kryo-2.20.jar:na]
>>> >     at
>>> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213)
>>> ~[kryo-2.20.jar:na]
>>> >     at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:504)
>>> ~[kryo-2.20.jar:na]
>>> >     at
>>> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:564)
>>> ~[kryo-2.20.jar:na]
>>> >     at
>>> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213)
>>> ~[kryo-2.20.jar:na]
>>> >     at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:504)
>>> ~[kryo-2.20.jar:na]
>>> >     at
>>> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:564)
>>> ~[kryo-2.20.jar:na]
>>> >     at
>>> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213)
>>> ~[kryo-2.20.jar:na]
>>> >     at com.esotericsoftware.kryo.Kryo.writeObjectOrNull(Kryo.java:552)
>>> ~[kryo-2.20.jar:na]
>>> >     at
>>> com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:68)
>>> ~[kryo-2.20.jar:na]
>>> >     at
>>> com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:18)
>>> ~[kryo-2.20.jar:na]
>>> >     at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:504)
>>> ~[kryo-2.20.jar:na]
>>> >     at
>>> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:564)
>>> ~[kryo-2.20.jar:na]
>>> >     at
>>> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213)
>>> ~[kryo-2.20.jar:na]
>>> >     at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:504)
>>> ~[kryo-2.20.jar:na]
>>> >     at
>>> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:564)
>>> ~[kryo-2.20.jar:na]
>>> >     at
>>> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213)
>>> ~[kryo-2.20.jar:na]
>>> >     at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:504)
>>> ~[kryo-2.20.jar:na]
>>> >     at
>>> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:564)
>>> ~[kryo-2.20.jar:na]
>>> >     at
>>> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213)
>>> ~[kryo-2.20.jar:na]
>>> >     at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:504)
>>> ~[kryo-2.20.jar:na]
>>> >     at
>>> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:564)
>>> ~[kryo-2.20.jar:na]
>>> >     at
>>> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213)
>>> ~[kryo-2.20.jar:na]
>>> >     at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:504)
>>> ~[kryo-2.20.jar:na]
>>> >     at
>>> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:564)
>>> ~[kryo-2.20.jar:na]
>>> >     at
>>> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213)
>>> ~[kryo-2.20.jar:na]
>>> >     at
>>> com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:571)
>>> ~[kryo-2.20.jar:na]
>>> >     at
>>> org.apache.s4.comm.serialize.KryoSerDeser.serialize(KryoSerDeser.java:91)
>>> ~[bin/:na]
>>> >     at
>>> org.apache.s4.core.ProcessingElement.serializeState(ProcessingElement.java:802)
>>> ~[bin/:na]
>>> >     at org.apache.s4.core.ft.SerializeTask.call(SerializeTask.java:42)
>>> ~[bin/:na]
>>> >     at org.apache.s4.core.ft.SerializeTask.call(SerializeTask.java:1)
>>> ~[bin/:na]
>>> >     at
>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>> ~[na:1.6.0_22]
>>> >     at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>>> ~[na:1.6.0_22]
>>> >     ... 3 common frames omitted
>>> > Caused by: java.util.ConcurrentModificationException: null
>>> >     at
>>> java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
>>> ~[na:1.6.0_22]
>>> >     at java.util.AbstractList$Itr.next(AbstractList.java:343)
>>> ~[na:1.6.0_22]
>>> >     at
>>> com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:74)
>>> ~[kryo-2.20.jar:na]
>>> >     at
>>> com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:18)
>>> ~[kryo-2.20.jar:na]
>>> >     at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:504)
>>> ~[kryo-2.20.jar:na]
>>> >     at
>>> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:564)
>>> ~[kryo-2.20.jar:na]
>>> >     ... 35 common frames omitted
>>> >
>>> >
>>>
>>>
>>
>>
>
>

Mime
View raw message