uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Eckart de Castilho <eck...@ukp.informatik.tu-darmstadt.de>
Subject Re: CAS serialization performance: XMI vs. Java serialization
Date Fri, 17 Aug 2012 22:33:02 GMT
Thanks for the pointer Marshall. Given though that the whole process ran for about 
30 minutes and the setup was comparatively simple, the JIT effect should be hardly
noticeable. Would you agree?

In any case, the measure is not meant to be exact, but rather give a better idea about the
performance improvement of binary serialization over XMI. At least I am pretty
convinced now that I should switch from XMI to binary persistence in some scenarios.

-- Richard 

Am 18.08.2012 um 00:02 schrieb Marshall Schor:

> One other thing I've noticed is important - because of Java's JIT, you need to
> "warm up" things before doing measurements.  Most commonly, people run the
> thing-being-measured multiple times, in a loop, and see a speedup - until
> there's no more speedup.
> -Marshall
> On 8/17/2012 5:40 PM, Richard Eckart de Castilho wrote:
>> Small update in case anybody is interested. I ran the experiment again, this time
writing to a ByteArrayOutputStream (initialized with a 512kb buffer). So it's measuring encoding
time now, no I/O, no GZip.
>> bin: 0:04:17.699 	11.266.341.029 byte
>> xmi: 0:24:40.485 	23.961.447.013 byte
>> That's more the expected difference. Still no results for reading though.
>>>> I am looking for a way to improve loading times in an application, so I did
a little experiment with binary CAS serialization to see if it was superior to XMI serialization.
For serialization I used the CASCompleteSerializer to serialize the type-system and heaps
into the same file using Java object serialization - at least that is what I understood it
should do. To read in these files, I would deserialize the CASCompleteSerializer and initialize
a CAS from it using CASImpl.reinit().
>>>> 96.400 files
>>>> plain text (uncompressed)      :                 581.865.593 Byte
>>>> binary (serialized java, gzip) : 0:47:02.835   3.555.449.597 Byte 
>>>> xmi (gzip)                     : 1:20:31.535   4.712.633.769 Byte

Richard Eckart de Castilho
Technical Lead
Ubiquitous Knowledge Processing Lab (UKP-TUD) 
FB 20 Computer Science Department      
Technische Universit├Ąt Darmstadt 
Hochschulstr. 10, D-64289 Darmstadt, Germany 
phone [+49] (0)6151 16-7477, fax -5455, room S2/02/B117
Web Research at TU Darmstadt (WeRC) www.werc.tu-darmstadt.de

View raw message