commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benedikt Ritter <brit...@apache.org>
Subject Re: [LANG] Clarification of method behavior in StringEscapeUtils
Date Mon, 03 Feb 2014 17:32:15 GMT
2014-02-03 Adam Hooper <adam@adamhooper.com>:

> On Sun, Feb 2, 2014 at 2:00 PM, Benedikt Ritter <britter@apache.org>
> wrote:
> >
> > 2014-02-01 Gary Gregory <garydgregory@gmail.com>:
> >
> >> On Sat, Feb 1, 2014 at 9:12 AM, Benedikt Ritter <britter@apache.org>
> >> wrote:
> >>
> >> >
> >> > These methods only escape the basic xml/html entities, though they may
> >> > produce invalid XML/HTML. LANG-955 [1] proposes to add new methods
> that
> >> > only produce valid XML, they should throw an exception if a character
> is
> >> > encountered that cannot be displayed in XML (not even by escaping).
> >>
> >> How does that the problem mentioned earlier on the ML of needing valid
> XML
> >> no matter what the input?
> >>
> >
> > I don't understand that sentence, sorry :o)
>
> As the author of that patch, my two pence:
>
> It's impossible to encode some characters in XML -- especially XML
> 1.0. That's because XML is a text-only format, so it only allows text.
> (This inspired Microsoft, when it created its XML document formats, to
> invent a new encoding scheme ("xstring", I think) that uses valid XML
> characters to encode invalid ones. Luckily, that encoding scheme never
> caught on outside of Microsoft-land.)
>
> While there's nothing _wrong_ with escapeXml as it stands right now
> (i.e., the code agrees with the docs), I argue that it doesn't solve
> the actual problem people are using it for: people want to escape
> strings for inclusion in XML documents, and escapeXml does not do
> that.
>
> I think escapeXml should not output invalid XML ever.
>
> Presumably encodeXml() is being used today for lots of XML documents,
> and it already throws a brutal exception: a valid XML parser will
> throw an exception when it reaches an invalid character. That speaks
> to the severity of the problem (it makes that data very hard to get
> at), and to the rarity of the problem (there haven't been many bug
> reports about this).
>
> >> There are several tasks for the API(s):
> >>
> >> - Escaping (implied by the API name)
> >> - Dealing with non-XML chars:
> >>   o Strip, or
> >>   o Throw exception
> >>
> >> The simplest solution using today's style would be:
> >>
> >> escapeXml10(String text, boolean strip)
> >> escapeXml11(String text, boolean strip)
> >>
> >> strip true - strips
> >> strip false - throws exception
> >>
> >
> > A boolean flag that controls whether a method throws an exception or not?
> > An exceptional situation is nothing that is configurable, imho.
> >
> >> What I am not sure on is why you would want an exception or what you'd
> do
> >> with it.
> >>
> >> Are these 'bad chars' embeddable in a CDATA? If so, strip false makes
> sense
> >> because we really cannot handle the text. But what would the app then do
> >> with the exception?
>
> I originally thought an exception would be useful, but I changed my
> mind as I wrote the patch. Some reasons:
>
> * What kind of exception? It isn't really an IOException, and the API
> doesn't seem keen on adding other kinds.
>
> * What would the user want to do with it? Re-run the operation in its
> exception-free incarnation?
>
> An exception might be useful for some people, but I think it would be
> right to steer those people towards a different API -- maybe not a
> part of commons-io.
>
> Enjoy life,
> Adam
>

Adam, thanks for sharing your thoughts.

This sounds like we're reaching a consensus here. I'd propose the following:

- deprecate escapeXml(String) (and no renaming to escapeXmlEntities or the
like)
- add escapeXml10 and escapeXml11, which escape xml entities and strip
invalid characters from the input.

What do we do with escapeHtml3 and escapeHtml4? Do we leave them unchanged?

Benedikt


>
> --
> My Phone (mobile): +1 613 986 3339
> My Website: http://adamhooper.com
> My Twitter: http://twitter.com/adamhooper
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


-- 
http://people.apache.org/~britter/
http://www.systemoutprintln.de/
http://twitter.com/BenediktRitter
http://github.com/britter

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message