uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marshall Schor <...@schor.com>
Subject Re: CAS serialization performance: XMI vs. Java serialization
Date Fri, 17 Aug 2012 22:02:32 GMT
One other thing I've noticed is important - because of Java's JIT, you need to
"warm up" things before doing measurements.  Most commonly, people run the
thing-being-measured multiple times, in a loop, and see a speedup - until
there's no more speedup.

-Marshall

On 8/17/2012 5:40 PM, Richard Eckart de Castilho wrote:
> Small update in case anybody is interested. I ran the experiment again, this time writing
to a ByteArrayOutputStream (initialized with a 512kb buffer). So it's measuring encoding time
now, no I/O, no GZip.
>
> bin: 0:04:17.699 	11.266.341.029 byte
> xmi: 0:24:40.485 	23.961.447.013 byte
>
> That's more the expected difference. Still no results for reading though.
>
> Cheers,
>
> -- Richard
>
>>> I am looking for a way to improve loading times in an application, so I did a
little experiment with binary CAS serialization to see if it was superior to XMI serialization.
For serialization I used the CASCompleteSerializer to serialize the type-system and heaps
into the same file using Java object serialization - at least that is what I understood it
should do. To read in these files, I would deserialize the CASCompleteSerializer and initialize
a CAS from it using CASImpl.reinit().
>>>
>>> 96.400 files
>>>
>>> plain text (uncompressed)      :                 581.865.593 Byte
>>> binary (serialized java, gzip) : 0:47:02.835   3.555.449.597 Byte 
>>> xmi (gzip)                     : 1:20:31.535   4.712.633.769 Byte


Mime
View raw message