subversion-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Branko Čibej <br...@wandisco.com>
Subject Re: Check-out fails with LANG=C
Date Wed, 24 Jul 2013 03:57:41 GMT
On 19.07.2013 15:22, Vincent Lefevre wrote:
> On 2013-07-09 20:21:33 +0200, Branko Čibej wrote:
>> Unlike on Windows and Mac OS (the latter at least with HFS+), the is no
>> notion of native filesystem encoding on other Unix-like platforms. The
>> best we can do is look at the locale settings, specifically, LC_CTYPE.
> No, the best you can do is to let the user choose. LC_CTYPE typically
> specifies the encoding used by the *terminal*, and this encoding may
> change when the user connects by SSH from a terminal with a different
> encoding.
>
>> I posit that if the "native encoding" is supposed to be UTF-8, then it
>> is an error to use LANG=C at all. Instead, one should use LANG=C.UTF-8.
> LANG=C.UTF-8 is completely non-portable for scripts. For instance:
>
> xvii:~> LANG=C.UTF-8 cp
> cp: opérande de fichier manquant
> Saisissez « cp --help » pour plus d'informations.
>
> xvii:~> LANG=C cp
> cp: missing file operand
> Try 'cp --help' for more information.
>
> A script that needs to work in some well-defined way, in particular
> with English messages (if they need to be parsed), must use the C
> (or POSIX) locale. With most tools, this is fine as they don't need
> to know how filenames are encoded.

Frankly I'm not interested in portable scripts. All you're showing above
is that on your particular system, setting LANG=C.UTF-8 doesn't do
anything. So perhaps you'll have to use LC_CTYPE=UTF-8,
LANG=en_US.UTF-8, or whatever happens to work on your particular flavour
of Unix-like OS.

All this is beside the point. The point is that it it not up to
Subversion to invent a new way of dealing with file-name encodings. We
use setlocale(LC_ALL, ""), this is the API that POSIX gives us and there
is no other that I'm aware of. And we're certainly not going to break
every working copy in existence by changing the way we transcode file
names on Unix (except Mac OS, which is always UTF-8 anyway).

I'll also point out that if you /need/ consistent, parseable output in
scripts, the command-line client already provides an --xml flag.

Sure, it would be nice if POSIX defined a portable way to consistently
determine file-name encoding, or even if there were reliable,
non-portable, OS-specific ways that we could use. But I'm not aware of any.

-- Brane

-- 
Branko Čibej | Director of Subversion
WANdisco // Non-Stop Data
e. brane@wandisco.com

Mime
View raw message