uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marshall Schor (JIRA)" <...@uima.apache.org>
Subject [jira] [Commented] (UIMA-2493) add compression to binary CAS serialization
Date Tue, 06 Nov 2012 18:34:12 GMT

    [ https://issues.apache.org/jira/browse/UIMA-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13491685#comment-13491685
] 

Marshall Schor commented on UIMA-2493:
--------------------------------------

The class SerDesTest has test cases for the new compressed serialization.

Measurements on CASes we are producing in one project, averaging approx 50 MB in size (plain
binary serialization), were compressed to about 1.5% of their original size (e.g., 50 MB ->
750K).  Of course, this depends on the data.  For these CASes, zipping XmiSerialization compressed
to about 15% of the original size.  An additional advantage of the binary compressed representation
is that it is about 10 times faster to deserialize versus the zipped xmi.  (Note, these measurements
were done on my laptop - with one particular group of CASes I had easy access to - so your
mileage may vary).

To use this on a cas, call ((CASImpl)cas).serializeWithCompression(out), where out can be
an OutputStream, a DataOutputStream, or a File.   To deserialize the result back, get a CAS
with an identical type system (a requirement of binary serialization) and then do ((CASImpl).cas).reinit(InputStream
in).  

See the test case for more examples of other ways of invoking this, including doing "delta"
cas serialization and deserialization (delta means that at some point, you create a "mark"
in the CAS, and then continue modifying the CAS; then serialization only serializes out things
that were changed subsequent to that mark).
                
> add compression to binary CAS serialization
> -------------------------------------------
>
>                 Key: UIMA-2493
>                 URL: https://issues.apache.org/jira/browse/UIMA-2493
>             Project: UIMA
>          Issue Type: Improvement
>          Components: Core Java Framework
>    Affects Versions: 2.4.0SDK
>            Reporter: Marshall Schor
>            Assignee: Marshall Schor
>            Priority: Minor
>             Fix For: 2.4.1SDK
>
>
> Add a mode to binary CAS serialization which compresses the serialized size, to reduce
the size, and potentially make deserialization faster (because of the smaller size).  Support
delta CAS modes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message