uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marshall Schor <...@schor.com>
Subject Re: How to use the new binary CAS (de)serialization?
Date Mon, 08 Jul 2013 21:49:37 GMT

On 7/8/2013 5:00 PM, Richard Eckart de Castilho wrote:
> Thanks for fixing the issue :)
Thank you for finding the bug :-)
>
> Now I'm trying another basic operation: serializing a CAS with a type system and deserializing
> into a CAS with zero types. I bump into two problems:
>
>
> First one:
>
> Following what to me would appear as the path of least surprise, I assumed that
>
> 1) Serialization.deserializeCAS(cas, bais, null, null);
>
> should behave the same as
>
> 2) Serialization.deserializeCAS(cas, bais);
>
> It apparently doesn't. 1) declares a ResourceInitializationException and only reads format
6 CASes, while 2) appears to accept form 0 (is that the correct name?), 4, and 6, and does
not throw a ResourceInitializationException.
I like the principle of least surprise :-)...

In this "edge" case, where the deserializeCAS is called with 4 args, but the
last 2 are null, I agree that a better implementation would be that it should
behave just like the 2 arg form.  I'll add that...

>
>
> Second one:
>
> The documentation says:
>
>> Deserialize with type filtering:
>>
>> The reuseInfo should be null unless deserializing a delta CAS, in which case, it
must be the reuse info captured when the original CAS was serialized out. If the target type
system is identical to the one in the CAS, you may pass null for it. If a delta cas is not
being received, you must pass null for the reuseInfo.
>>
>> Serialization.deserializeCAS(cas, bais, tgtTypeSystem, reuseInfo);
> So I assume that when I deserialize my persisted CAS into a fresh one which doesn't contain
any types, the only thing that should arrive is the SofA. But, no matter what serialization
format I use (0, 4, or 6), I always get an ArrayIndexOutOfBoundsException.
>
> I create the target CAS like this:
>
>         CAS cas = CasCreationUtils.createCas((TypeSystemDescription) null, null, null);
>
>
> Format 0: 
>
> java.lang.ArrayIndexOutOfBoundsException: 37
> 	at org.apache.uima.cas.impl.FSIndexRepositoryImpl.incrementIllegalIndexUpdateDetector(FSIndexRepositoryImpl.java:1543)
> 	at org.apache.uima.cas.impl.FSIndexRepositoryImpl.ll_addFS(FSIndexRepositoryImpl.java:1625)
> 	at org.apache.uima.cas.impl.FSIndexRepositoryImpl.addFS(FSIndexRepositoryImpl.java:1059)
> 	at org.apache.uima.cas.impl.CASImpl.reinitIndexedFSs(CASImpl.java:1480)
> 	at org.apache.uima.cas.impl.CASImpl.reinit(CASImpl.java:1282)
> 	at org.apache.uima.cas.impl.CASImpl.reinit(CASImpl.java:1168)
> 	at org.apache.uima.cas.impl.Serialization.deserializeCAS(Serialization.java:171)
>         …
>
> Format 4:
>
> java.lang.ArrayIndexOutOfBoundsException: 37
> 	at org.apache.uima.cas.impl.BinaryCasSerDes4.getTypeInfo(BinaryCasSerDes4.java:2497)
> 	at org.apache.uima.cas.impl.BinaryCasSerDes4.access$1(BinaryCasSerDes4.java:2496)
> 	at org.apache.uima.cas.impl.BinaryCasSerDes4$Deserializer.deserialize(BinaryCasSerDes4.java:1621)
> 	at org.apache.uima.cas.impl.BinaryCasSerDes4$Deserializer.access$18(BinaryCasSerDes4.java:1567)
> 	at org.apache.uima.cas.impl.BinaryCasSerDes4.deserialize(BinaryCasSerDes4.java:360)
> 	at org.apache.uima.cas.impl.CASImpl.reinit(CASImpl.java:1197)
> 	at org.apache.uima.cas.impl.CASImpl.reinit(CASImpl.java:1168)
> 	at org.apache.uima.cas.impl.Serialization.deserializeCAS(Serialization.java:171)
>         …
>
> Format 6:
>
> java.lang.ArrayIndexOutOfBoundsException: 37
> 	at org.apache.uima.cas.impl.TypeSystemImpl.getTypeInfo(TypeSystemImpl.java:1566)
> 	at org.apache.uima.cas.impl.BinaryCasSerDes6.deserializeAfterVersion(BinaryCasSerDes6.java:1701)
> 	at org.apache.uima.cas.impl.CASImpl.reinit(CASImpl.java:1203)
> 	at org.apache.uima.cas.impl.CASImpl.reinit(CASImpl.java:1168)
> 	at org.apache.uima.cas.impl.Serialization.deserializeCAS(Serialization.java:171)
>         …
>
> Am I misunderstanding how the (de)serialization is supposed to work?
Form 0 and 4 do not support binary serialization / deserialization unless the
source and target type systems are identical.  If this is not the case, you'll
get errors like you saw.

Form 6 supports having different type systems.  When using this, it expects the
"other" type system to be passed in, as a type system impl object.  If "null" is
passed in, then it assumes the "other" type system is identical to the first
one.  (this is what the JavaDocs mean, when it says:

If the target type system is identical to the one in the CAS, you may pass null for it. 


So, to make form 6 work for you, you have to do something like:

  a) Create an instance of a type system impl for the types in your serialized form.
For instance, if you created a CAS with some types in it, and serialized it,
before you
get rid of that CAS, save its type system in a variable:

    TypeSystem tsThatWasSerialized = theCASthatWasSerialized.getTypeSystem();

Use this type system as the argument, (not "null") when calling the form 6 style deserialize:

Serialization.deserializeCAS(cas, bais, tsThatWasSerialized, null);

Is that something like what you did? 

-Marshall


Mime
View raw message