asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jianfeng Jia <jianfeng....@gmail.com>
Subject Re: Possible Race condition in the new UTF8String implementation
Date Wed, 11 Nov 2015 20:27:12 GMT
Then that will be two different issues.
Just want to make sure that you’ve rebuilt the hyracks (not only asterixdb) before test
your code, cause those changes are in hyracks.
And could you send the logic plan and the hyrack job so that we can lock which hyracks operators
that get involved?

> On Nov 11, 2015, at 12:10 PM, abdullah alamoudi <bamousaa@gmail.com> wrote:
> 
> That was my first thought as I said but I am 100% sure the issue is not in
> the SerDe. To confirm this, I removed the reader and writer from the serde
> and created a new instance of reader/writer in every call to serialize or
> deserialize just to determine if the problem is gone.
> 
> The problem didn't go away and I still had the same issue. That is why I
> know for sure it is not the SerDe.
> 
> Don't waste any more time in that direction.
> ~Abdullah.
> 
> Amoudi, Abdullah.
> 
> On Wed, Nov 11, 2015 at 10:54 PM, Jianfeng Jia <jianfeng.jia@gmail.com>
> wrote:
> 
>> Here is my finding and thoughts.
>> I think I’ve checked all the direct use case of UTF8SerDer. However, I
>> missed some indirect static/shared use case of UTF8SerDer.
>> 
>> One big suspect is the RecordDescriptor which has the
>> ISerializerDeserializers inside and is always passed into the Factory
>> method and shared by the ThreadMethod (usually NodePushable).
>> E.g., in the ResultWriterOperatorDescriptor, the outRecordDesc is passed
>> to the createPushRuntime() factory method to create the “resultSerializer”,
>> and it is shared by the thread object
>> AbstractUnaryInputSinkOperatorNodePushable. This pushable object will
>> directly get the deserializer from the shared
>> recordDescpitor.getFields()[i]. It explains the issue-1164.
>> 
>> I guess in your case there must be some deserializers given by shared
>> RecordDescriptor. Then it will get into the racing condition if there are
>> some UTF8StringSerDer involved.
>> 
>> Given that the SerDers are stored in the shared RecordDescriptor, I think
>> the very initial design was to make the all the SerDers thread-safe. And it
>> maybe some other data structures stores the SerDers and are passed/used in
>> a same way. Then I’d have to propose to roll back the UTF8SerDer into the
>> state-less version (at the expense of creating intermediate buffer array
>> per record).
>> 
>> Any opinions?
>> 
>> 
>>> On Nov 11, 2015, at 10:54 AM, abdullah alamoudi <bamousaa@gmail.com>
>> wrote:
>>> 
>>> That was my first thought and so I changed it. The issue is still there.
>>> I am also using the UTF8StringSerializerDeserializer to deserialize the
>>> strings and they always serialize it correctly.
>>> 
>>> I am thinking maybe it is related to the UTF8StringPointable but I am not
>>> sure how that could be.
>>> I am looking at this as well,
>>> Abdullah.
>>> 
>>> Amoudi, Abdullah.
>>> 
>>> On Wed, Nov 11, 2015 at 8:05 PM, Jianfeng Jia <jianfeng.jia@gmail.com>
>>> wrote:
>>> 
>>>> The possible racing condition could be that the
>>>> UTF8StringSerializerDeserializer now is not a singleton method any
>> more. It
>>>> was implemented to reuse the byte[] that serialize/deserialize the
>> string
>>>> object. Let me look into this issue.
>>>> 
>>>>> On Nov 11, 2015, at 8:37 AM, abdullah alamoudi <bamousaa@gmail.com>
>>>> wrote:
>>>>> 
>>>>> Highly probable.
>>>>> Please, let's fix this soon.
>>>>> 
>>>>> Amoudi, Abdullah.
>>>>> 
>>>>> On Wed, Nov 11, 2015 at 7:32 PM, Till Westmann <tillw@apache.org>
>> wrote:
>>>>> 
>>>>>> https://issues.apache.org/jira/browse/ASTERIXDB-1164
>>>>>> might be related.
>>>>>> 
>>>>>> Cheers,
>>>>>> Till
>>>>>> 
>>>>>> On 11 Nov 2015, at 8:25, abdullah alamoudi wrote:
>>>>>> 
>>>>>>> Hi all,
>>>>>>> I am having a hard time figuring this out. Here are the symptoms
I am
>>>>>>> seeing in case one has an idea what this could be.
>>>>>>> 
>>>>>>> I have a feed running ingesting data into a dataset. sporadically,
I
>>>> get
>>>>>>> duplicate key exception errors (The key is of a string type)
and I am
>>>>>> 100%
>>>>>>> sure that I don't have duplicate records.
>>>>>>> 
>>>>>>> Moreover, I am printing the content of the frames about to be
>> inserted
>>>>>> into
>>>>>>> the primary index and there are no duplicate records.
>>>>>>> 
>>>>>>> There are three reasons why I am suspecting the String
>> implementation:
>>>>>>> 1. It is fairly recent change.
>>>>>>> 2. When I run on a single node, or run one thread at a time,
I never
>>>> get
>>>>>>> this exception.
>>>>>>> 3. the key is a String.
>>>>>>> 
>>>>>>> I have looked at the change trying to figure out where a race
>> condition
>>>>>>> might take place but it is well hidden (if it is true at all.).
>>>>>>> 
>>>>>>> Let me know if you have seen something similar.
>>>>>>> 
>>>>>>> Cheers,
>>>>>>> Abdullah.
>>>>>> 
>>>> 
>>>> 
>>>> 
>>>> Best,
>>>> 
>>>> Jianfeng Jia
>>>> PhD Candidate of Computer Science
>>>> University of California, Irvine
>>>> 
>>>> 
>> 
>> 
>> 
>> Best,
>> 
>> Jianfeng Jia
>> PhD Candidate of Computer Science
>> University of California, Irvine
>> 
>> 



Best,

Jianfeng Jia
PhD Candidate of Computer Science
University of California, Irvine


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message