commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Bodewig <>
Subject Re: [compress] [PATCH] Refactoring of zip encoding support.
Date Tue, 03 Mar 2009 04:40:15 GMT
On 2009-03-02, Wolfgang Glas <> wrote:

> Stefan Bodewig schrieb:
>> On 2009-03-01, Wolfgang Glas <> wrote:

>>> 1) Unicode extra fields are written for all ZIP entries and not only
>>> for entries, which are not encodable by the encoding set to
>>> ZipArchiveOutputStream.

>> Maybe room for yet another flag?  Or an enum-like option

>> setCreateUnicodeExtraFields(NEVER | ALWAYS | NOT_ENCODABLE)

Consider the WinZIP case, WinZIP wouldn't recognize the EFS.  If you
set the encoding to UTF-8 and use your code and only add extra fields
for non-encodable paths, WinZIP will never see the correct path.

> I like the idea of a unicode policy flag ;-)

May be a better approach, agreed.  But only if we manage to cover all
border cases.

> My suggestion is

> setUnicodePolicy(
>   SURROGATES   | /* no extra fields, no utf-8 fallback, only %Uxxxx surrogates*/
>   EXTRA_FIELDS | /* extra fields for unencodable entriey, no utf-8 fallback   */
>   EXTRA_FIELDS_ALWAYS | /* extra fields for all entries, no utf-8 fallback    */
>   UTF8_FALLBACK| /* fall back to utf-8 plus EFS flag for unencodable entries. */
>   UTF8_FALLBACK_EXTRA_FIELDS| /* fall back to utf-8 plus EFS flag plus extra
>                                  fields for unencodable */
>   UTF8_FALLBACK_EXTRA_FIELDS_ALWAYS /* fall back to utf-8 plus EFS flag for
>                                        unencodable entries, exta fields for all
>                                        entries. */
> )

> We might drop the last two options and we might choose a better
> wording, however the direction should IMHO be as above mentioned...

This covers all permutations, agreed.

Names, names, I'm really bad at them.

UTF8_FALLBACK                     => FALL_BACK_TO_UTF8

but looking at the names we may be better off with two independent
options.  Hmm, yes, right now I prefer two flags because they seem to
be orthogonal.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message