uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marshall Schor <...@schor.com>
Subject Re: CasIOUtils class - some meta-questions
Date Thu, 04 Aug 2016 15:16:46 GMT
If no one is currently needing COMPRESSED_FLTERED_TS now, I'm +1 for removing it
for the following reasons:

  - the current impl combines a very non-compressed format for the Type System
and Index definition with a highly compressed CAS representation; not a good
format.  (I've seen some applications which have close to a 1000 types defined).

  - the two header problem could be solved by pushing the implementation down
into the base form 6 serialization, but that would take a bit of time and
thinking :-), and I'd rather get this release out and add that later.

-Marshall

On 8/4/2016 5:24 AM, Peter Klügl wrote:
> So, what should we do?
>
>
> deactivate COMPRESSED_FILTERED_TS completely, or even remove the
> SerialFormat?
>
>
> Best,
>
>
> Peter
>
>
> Am 04.08.2016 um 11:22 schrieb Richard Eckart de Castilho:
>> I'd personally prefer only one header, but looks like that would require
>> more refactoring, e.g. extracting the reading of the header out
>> of org.apache.uima.cas.impl.CASImpl.reinit(InputStream)...
>>
>> Cheers,
>>
>> -- Richard
>>
>>> On 04.08.2016, at 09:07, Peter Klügl <peter.kluegl@averbis.com> wrote:
>>>
>>> Yes, form6+ gets two headers. The first one for identifying the format
>>> and typesystem inclusion for the utils class, the second one for the
>>> actual serialization code. I didn't see any better solution for this.
>>>
>>>
>>> Am 03.08.2016 um 18:28 schrieb Richard Eckart de Castilho:
>>>> It is a bit hard to see... do we have cases now where two headers are written
to the file? 
>>>> E.g. in a form6 + TS, one before the type system and another one before the
actual CAS data?
>>>>
>>>> -- Richard
>


Mime
View raw message