uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marshall Schor <...@schor.com>
Subject versioning cas serializations
Date Wed, 13 Jan 2016 20:28:30 GMT

I'm working on UIMA-4743 - fixing some binary cas serialization problems, which
will unfortunately make the binary serialization for "delta" formats not
backward compatible (the fix may have extra bytes in it).

We currently have a partially architected scheme for serialization forms, which
looks like:
  - 1 word encoding U + I + M + A and also serving to identify byte order
  - 1 word for bit-encoding some categorizations:
     -- a bit for delta / non delta
     -- a bit for compressed / non compressed
  - 0 or 1 additional word for incrementing in some fashion a version number for
a particular serialization category (named below as "2nd version word)

This 2nd version word is currently only used with compressed serialization formats.

I'm thinking of assigning another bit in the first word to indicate there's a
2nd version word present.

I would turn this on for the repaired binary delta format, and supply a version

Our current compressed formats use "1" as the incrementing version number.

Thinking ahead, perhaps the serialization formats should have a multi-part 2nd
version word, along some standards. 
The "semantic versioning" standard has sparked some push-back (see
https://gist.github.com/jashkenas/cbd2b088e20279ae2c8e )
basically saying the "mechanical" approach of semantic versioning isn't rich
enough for the grey areas of real world use, and ends up obscuring the purpose
of indicating how "far" one version is from another. 

I'm leaning toward something simple, such as using the Major/Minor/Patch format,
each value 1 byte, in the 3 lower bytes of the 2nd version word, giving 256
possibilities for each (more than I've ever seen used).

Other ideas?


View raw message