uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marshall Schor <...@schor.com>
Subject Re: CasIOUtils class - some meta-questions
Date Thu, 04 Aug 2016 22:08:07 GMT
javadocs now updated. -M


On 8/4/2016 11:08 AM, Marshall Schor wrote:
> I'm taking a try at the general documentation for this class; here's what I have
> (written from the point of view of being useful to new users of this class).
>
> CasIOUtils is a collection of static methods aimed at making it easy to
>   - save and load CASes, and to
>   - optionally include their Type Systems and index definitions based on those
> type systems (abbreviated TSI). 
>
> There are several serialization formats supported; these are listed in the Java
> enum SerialFormat, together with their preferred file extension name. 
>
> The APIs for loading attempt to automatically use the appropriate deserializers,
> based on the input data format.  To select the right deserializer, first, the
> file extension name (if available) is used:
>   - xmi: XMI format
>   - xcas: XCAS format
>   - xml: XCAS format
>
> If none of these apply, then the first few bytes of the input are examined to
> determine the format.
>
> For loading, the inputs may be supplied as URLs or as InputStream.  You can use
> Files or Paths by converting these to URLs:
>    URL url = a_path.toUri().toURL();
>    URL url = a_file.toUri().toURL();
>
> When loading, an optional lenient boolean flag may be specified; if true, then
> types and/or features being deserialized which don't exist in the receiving CAS
> are silently ignored.
>
> When TSI is saved, it is either saved in the same destination (e.g. file or
> stream), or in a separate one. 
>   - Two serialization formats support saving the TSI in the same destination: 
>     -- SERIALIZED_TS and
>     -- COMPRESSED_FILTERED_TS.
> Other formats require the TSI to be saved to a separate OutputStream.
>
> Summary of the APIs for saving:
>
>   save(CAS, OutputStream, SerialFormat)
>   save(CAS, OutputStream, OutputStream, SerialFormat)  - extra outputStream for
> saving the TSI
>
> Summary of APIs for loading:
>  load(URL        , CAS)
>  load(InputStream, CAS)  
>
>  load(URL        , URL        , CAS, lenient_flag)   - the second URL is for
> loading a separately-stored TSI
>   load(InputStream, InputStream, CAS, lenient_flag
>
> You may specify the lenient_flag without the TSI input by setting the 2nd
> argument to null.
> ===============================================================================
>
> To make this documentation correct, the impl needs some slight adjustments:
>
> The method for reading the first few bytes of input to determine the format: 
> should look for XCAS format explicitly (e.g., load the first 10,000 bytes and
> search for <CAS> as the first XML element?) and maybe handle it.
>
> Make the load with non-null TSI input work for all formats (currently silently
> ignored for xmi, xcas)
>
> WDYT?
>
> -Marshall
>
>
>
>


Mime
View raw message