uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Eckart de Castilho <eck...@ukp.informatik.tu-darmstadt.de>
Subject Re: CAS serialization performance: XMI vs. Java serialization
Date Fri, 17 Aug 2012 21:40:04 GMT
Small update in case anybody is interested. I ran the experiment again, this time writing to
a ByteArrayOutputStream (initialized with a 512kb buffer). So it's measuring encoding time
now, no I/O, no GZip.

bin: 0:04:17.699 	11.266.341.029 byte
xmi: 0:24:40.485 	23.961.447.013 byte

That's more the expected difference. Still no results for reading though.


-- Richard

>> I am looking for a way to improve loading times in an application, so I did a little
experiment with binary CAS serialization to see if it was superior to XMI serialization. For
serialization I used the CASCompleteSerializer to serialize the type-system and heaps into
the same file using Java object serialization - at least that is what I understood it should
do. To read in these files, I would deserialize the CASCompleteSerializer and initialize a
CAS from it using CASImpl.reinit().
>> 96.400 files
>> plain text (uncompressed)      :                 581.865.593 Byte
>> binary (serialized java, gzip) : 0:47:02.835   3.555.449.597 Byte 
>> xmi (gzip)                     : 1:20:31.535   4.712.633.769 Byte

Richard Eckart de Castilho
Technical Lead
Ubiquitous Knowledge Processing Lab (UKP-TUD) 
FB 20 Computer Science Department      
Technische Universit├Ąt Darmstadt 
Hochschulstr. 10, D-64289 Darmstadt, Germany 
phone [+49] (0)6151 16-7477, fax -5455, room S2/02/B117
Web Research at TU Darmstadt (WeRC) www.werc.tu-darmstadt.de

View raw message