james-server-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jochen Wiedmann" <jochen.wiedm...@gmail.com>
Subject Resuse mime4j in commons-fileupload
Date Mon, 06 Aug 2007 02:08:27 GMT
Hi,

I am the current maintainer of Commons FileUpload and would like to
reuse Mime4J as the multipart parser. Thanks to the acceptance of my
pull parser patch (MIME4J-19), this is now possible. In an ongoing
thread, others have expressed interes in following this step. See

    http://www.nabble.com/RfC%3A-commons-fileupload-2%2C-based-on-mime4j-tf4220932.html

In the last days, I have developed a first implementation of what I'd
like to see as Commons FileUpload 2.0, which you can find at

    http://people.apache.org/~jochen/commons-fileupload

It is based on a patched version of Mime4J 0.4-SNAPSHOT, which you
find at the same location.

All in all, I found a few minor flaws in Mime4J, which I'd like to see
fixed. I'd like to post them here for general discussion. If I
hopefully find general agreement, then I would split them into patches
and submit them to Jira.

One reason for this procedure is to beg for a kind of "fast track": If
I submit several patches and wait for a long time, before they are
accepted, then this may take too much time. It would help, if some
developer could agree to move this forward together with me, or if I
might be able to get committer privileges.

Here are my points, ordered by priority:

Required

  1.) Parsing a message without headers

       Currently, Mime4j can only parse messages with headers. That's
not suitable
       for parsing an HTTP message, because the typical situation is
in a servlet,
       that doesn't see the headers.

       I'd propose a new method

            public parse(InputStream, BodyDescriptor)

      This method would be specified as emitting a sequence

          T_START_MULTIPART ... T_END_MULTIPART

      as opposed to

          T_START_MESSAGE T_START_HEADER ... T_END_HEADER
          T_START_MULTIPART ... T_END_MULTIPART T_END_MESSAGE

      The required patch is relatively minor and should not complicate
the parser.

Recommended:

  2.) Let BodyDescriptor provide full blown access to the headers

       Currently, BodyDescriptor offers access to the content-type and
       content-transfer-encoding headers only.

      As a consequence, the mime4j user is forced to listen for T_FIELD events
      and build its own header map. This is duplicated work, in
particular, because
      all mime4j users will likely do the same.

      I'd propose to:

        - Replace BodyDescriptor with an interface. (Assumes that this
is possible,
          I am guessing by the version number 0.4, but I maybe wrong.)
        - Make the BodyDescriptor implementation pluggable by adding a method

            protected void newBodyDescriptor()

          to the Mime4JTokenStream.
        - Provide a default implementation that maintains a map of headers and
          values.
        - Open up the method

              private void getHeaderParams(String)

          by making it static and moving it to a utility class or by
providing an accessor
          that takes a header name as an argument and invokes the
method by providing
          the value as input.
        - Rename getParameters() to getContentTypeParameters(),
because the method
          name is definitely confusing. I clearly had the impression
that this method would
          provide the header values.

  3.) Drop lazy syntax checking or make it optional

       Mime4j has a lot of places where it detects syntax errors of the
       multipart stream. Currently, these are reported by a warning message,
       which is being logged.

       This behaviour is improper. Such situations should cause an
exception or at
       least the Mime4j user should be able to request that they do.

  4.) Provide utility classes for the Mime4j user

       I have implemented code in Commons Fileupload, which would better
       sit in Mime4j, because it is likely to be shared by Mime4j
users. In particular,
       a utility class for implementation of header maps, could be pushed down.

  5.) Provide methods

          public String getFieldName()
          public String getFieldValue()

       The user of

          public String getField()

       is forced to parse the returned value in order to obtain the
field name and the
       value, although the Mime4jTokenStream has already done exactly that.

  6.) Drop

         public int read(byte[] b)

       from PartialInputStream and PositionInputStream. These methods are by
       default delegating to  public int read(byte[], int, int) and
the default implementation
       works fine. Overwriting these method only enforces, that subclasses must
       implement them too.

  7.) Add support for limitations.

       In Commons Fileupload, it is possible to limit the overall
request size and/or the
       size of a an atomic entity. This is highly recommended for web
applications,
       as a security measure against DOS attacks.

       This can be implemented by the Mime4j user. However, it is also
likely to be
        reused, so it might better be pushed down.

Thanks for reading so far. :-)

Jochen


-- 
"Besides, manipulating elections is under penalty of law, resulting in
a preventative effect against manipulating elections.

The german government justifying the use of electronic voting machines
and obviously  believing that we don't need a police, because all
illegal actions are forbidden.

http://dip.bundestag.de/btd/16/051/1605194.pdf

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Mime
View raw message