commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Bodewig <>
Subject Re: [compress] [PATCH] Refactoring of zip encoding support.
Date Tue, 03 Mar 2009 16:11:24 GMT
On 2009-03-03, Wolfgang Glas <> wrote:

> Stefan Bodewig schrieb:
>> On 2009-03-02, Wolfgang Glas <> wrote:

>>> Stefan Bodewig schrieb:
>>>> On 2009-03-01, Wolfgang Glas <> wrote:

>>>>> 1) Unicode extra fields are written for all ZIP entries and not only
>>>>> for entries, which are not encodable by the encoding set to
>>>>> ZipArchiveOutputStream.

>>>> Maybe room for yet another flag?  Or an enum-like option

>>>> setCreateUnicodeExtraFields(NEVER | ALWAYS | NOT_ENCODABLE)

>> Consider the WinZIP case, WinZIP wouldn't recognize the EFS.  If you
>> set the encoding to UTF-8 and use your code and only add extra fields
>> for non-encodable paths, WinZIP will never see the correct path.

> Acccording to my tests WinZip recognizes the EFS flag upon
> reading.

Then my documenation is wrong 8-)

> Secondly, if you set the encoding to UTF-8, there's no need for
> unicode extra fields anyway.

Except when your client doesnt recognize the EFS flag and thinks you'd
be using CP437 - but happily accepts the Unicode extra fields.  I
thought this would be the case for WinZIP.

>> but looking at the names we may be better off with two independent
>> options.  Hmm, yes, right now I prefer two flags because they seem to
>> be orthogonal.

> I think you should choose, which approach better fits your needs in
> ant ;-) At least you have to write an XML parser for these settings

You vastly overestimate the effort it takes to write an Ant task.

is all I had to do for the two existing options.

> and the documentation, so you might choose the approach which may be
> explained in brief words.

> I can live very well with two options ;-)

If you throw in "fallbacks" we are actually facing three concepts.

OK, this is what I feel makes most sense:

createUnicodeExtraFields = NEVER (default) | ALWAYS | NOT_ENCODABLE
useLanguageEncodingFlag = true (default) | false
fallbackToUtf8 = true | false

I'm not sure about the default for the later, probably
default fallbackToUtf8 = (createUnicodeExtraFields == NEVER)

Unfortunately I don't really see how we can merge all permutations
into meaningful names otherwise.  But suggestions are welcome.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message