uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Eckart de Castilho <eck...@ukp.informatik.tu-darmstadt.de>
Subject Re: CAS serialization performance: XMI vs. Java serialization
Date Wed, 15 Aug 2012 09:09:22 GMT
Am 15.08.2012 um 11:00 schrieb Thilo Goetz:

> However, as I recall, there was a way you could serialize the CAS
> without the type system if you were sure you didn't need it.  Isn't that
> the difference between the CasCompleteSerializer and the
> NotSoCompleteSerializer (making that up here)?  On the way back, you can
> deserialize into an existing CAS that has the right type system.

I tried the CasCompleteSerializer (in contrast to the CasSerializer) because I am not sure
"the right type system" means. Afaik, on configuration of the type system, type internally
get assigned
numeric IDs which are then used in the heap. I wasn't sure if these couldn't change between
runs, even though the type system is technically the same.

> Your times above, do they include time needed to do the compression?
> I'm surprised binary serialization is not even twice as fast.  Or is
> this gated by the disk I/O?

It currently includes gzip compression and is limited by disk i/o, since that's the scenario
I am faced with.
For curiosity, I was planning to run the same test writing to a ByteArrayOutputStream to see
how much time
the actual encoding takes. I was also surprised that it wasn't faster and in particular that
the file size
wasn't much smaller.

-- Richard

Richard Eckart de Castilho
Technical Lead
Ubiquitous Knowledge Processing Lab (UKP-TUD) 
FB 20 Computer Science Department      
Technische Universit├Ąt Darmstadt 
Hochschulstr. 10, D-64289 Darmstadt, Germany 
phone [+49] (0)6151 16-7477, fax -5455, room S2/02/B117
Web Research at TU Darmstadt (WeRC) www.werc.tu-darmstadt.de

View raw message