james-server-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Brewin" <sbre...@synsys.com>
Subject RE: i am not getting subject content in utf-8 format
Date Fri, 04 May 2007 21:12:23 GMT
danny.angus@gmail.com wrote:

> On 5/4/07, ketanbparekh <tabho_2000@yahoo.com> wrote:
> > I am trying to filter mails by passing persian words in
> subject. But persian
> > words appear as ? marks in subject and jsieve is not able
> to filter it.
> This is a complex area, and it really depends where the
> corruption is occuring.
> You need to ensure that the software which creates the message, your
> sieve scripts, and the software you use to view it all are using
> UTF-8.
> Then if there is still a problem we need to make sure that the charset
> is properly recorded in the headers, and that the header is escaped
> correctly using base64 or Uuencode.

All of the above is true.

I suspect that the problem is not within jSieve. This can be proven by
removing it from the chain and seeing if retrieved mails "persian words
[still] appear as ? marks in subject".

For information, here is how RFC 3028 says jSieve should work with regard to
character sets...


2.1.     Form of the Language

   The language consists of a set of commands.  Each command consists of
   a set of tokens delimited by whitespace.  The command identifier is
   the first token and it is followed by zero or more argument tokens.
   Arguments may be literal data, tags, blocks of commands, or test

   The language is represented in UTF-8, as specified in [UTF-8].

   Tokens in the ASCII range are considered case-insensitive.

2.7.2.   Comparisons Across Character Sets

   All Sieve scripts are represented in UTF-8, but messages may involve
   a number of character sets.  In order for comparisons to work across
   character sets, implementations SHOULD implement the following

      Implementations decode header charsets to UTF-8.  Two strings are
      considered equal if their UTF-8 representations are identical.
      Implementations should decode charsets represented in the forms
      specified by [MIME] for both message headers and bodies.
      Implementations must be capable of decoding US-ASCII, ISO-8859-1,
      the ASCII subset of ISO-8859-* character sets, and UTF-8.

   If implementations fail to support the above behavior, they MUST
   conform to the following:

      No two strings can be considered equal if one contains octets
      greater than 127.


...if it doesn't, let us know!


-- Steve

To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org

View raw message