freemarker-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Dekany <>
Subject Re: Auto-Sanitization for FreeMarker
Date Wed, 16 Mar 2016 09:33:47 GMT
I would like to figure out if forking is indeed needed for this to
work. And if so, if that could be avoided in the case of FM 3.

The way it works in 2.3.24 is simply that anything that's not a
TemplateMarkupOutputModel (which might should be called
TemplateSanitizedContantModel) is untrusted (to use your terms), and
so will be escaped. So strings (TemplateScalarModel-s) are always
untrusted. When you concatenate trusted and untrusted values, the
untrusted part is escaped, so the result is trusted. So that's how it
works in 2.3.24.

What marks a value to be trusted or untrusted in your solution?
Especially, why do you track such state for non-string values? After
all, no hacker can put HTML tags and such into a number or boolean or
date. Is labelling all type of values is indeed unavoidable, or we
could just store more detailed information in
TemplateMarkupOutputModel and that could solve this problem?

How do you track the output browser's parser state and then where does
the final sanitization happen? My first idea is passing a Writer to
Template.process that parses what's written into it, and then ${} has
to call some special Writer methods to which it can pass the
sanitization information. The problem with such a solution is that if
you do output capturing (<#assign x>...</#assign>) or apply any kind
of filter directive, as they are working with "unlabelled" character
streams, there will be no browser's state. In FM3, I think we will use
some kind of FreeMarkerWriter (a term you have used too!) that treats
the output stream more than just a stream of characters, and the plain
Writer will be only the final stage. My initial motivation there was
better white-space handling, but the same thing can also be useful for
sanitization. That means that we dumb down the output to a mere stream
of characters as late as possible. Instead, you have a stream of
objects where some objects are TemplateMarkupOutputModel-s. Just an
idea anyway.

BTW, what's your fork is based on? If on 2.3.23, as opposed to on the
2.3-gae head from a no more than a few months ago, you will find that
there were a lot of changes due to code formatting clean up...

 Daniel Dekany

Tuesday, March 15, 2016, 11:45:47 PM, Matthew Van Gundy wrote:

> Hi Daniel,
> I took a quick look at the auto-escaping feature documentation and it
> looks like it's doing part of what we're doing.  Given the potential
> lead time that may be involved in getting our contribution request
> approved, I wouldn't put 2.3.24 on hold for our changes.  However, at
> first blush, I think that we'll be able to capitalize on the changes.
> In particular, the fewer explicit calls to "s?escape" the better.  Also,
> explicit use of "s?no_esc" will help us out as well.
> At a very high-level, in order to achieve context-sensitive
> auto-sanitization, you need to do the following:
> 1. Track
>     - Data origin integrity --- is the source trusted or untrusted?
>     - Sanitizations applied --- to each value on each program path
> 2. At every output of a trusted value: print(trusted)
>    Model the effect on the browser’s parser state (context)
> 3. At every output of an untrusted value: print(untrusted)
>    If the value has not been sufficiently sanitized, sanitize it
>    appropriately for the current context
> I'm guessing that you're currently doing 2 & 3-lite by checking whether
> you're about to output a TemplateMarkupOutputModel in FreeMarkerWriter.
> If it's a TemplateMarkupOutputModel, don't escape. Otherwise, escape.
> The approach is similar for context-sensitive auto-sanitization except
> that we also track the browser's parser state and parse the markup
> before sending it to the browser so that we know what the browser's next
> parser state will be.
> For tracking data flow, we label each value in the program with the
> sanitization that has been applied.  Then for each operation on that
> value, we define how it updates the label.  A straw man example would be
> determining the integrity (trustworthiness) of boolean values:
> class Boolean {
>     boolean value;
>     public static Boolean and(Boolean x,
>                               Boolean y) {
>         boolean result = x.value
>                       && y.value;
>         return new Boolean(result);
>     }
> }
> would become:
> class Boolean {
>     boolean value;
>     boolean untrusted;
>     public static Boolean and(Boolean x,
>                               Boolean y) {
>         boolean result = x.value
>                       && y.value;
>         boolean new_label = x.untrusted
>                          || y.untrusted);
>         return new Boolean(result,
>                            new_label);
>     }
> }
> Of course the labels for tracking sanitization are more complicated.
> But, in practice, we extended each Template*Model with a label field and
> updated all operations to compute an updated label value.
> The one sticking point was java.lang.String.  There are a number of
> places in the FreeMarker Engine where java.lang.String was used instead
> of freemarker.template.TemplateScalarModel.  Due to time constraints, we
> modified FreeMarker to promote all java.lang.String operations on
> potentially-untrusted data to the labeled
> freemarker.template.TemplateScalarModel.  With heavy refactoring, it may
> be possible to track string data flow external to the individual string
> values and avoid that conversion.
> In all our changes to src/ come out to:
>  114 files changed, 11555 insertions(+), 895 deletions(-)
> with 18 files changed, 6701 insertions(+) being entirely new parsing and
> sanitization code that we ported and extended from Closure Templates.
> Cheers,
> Matt
> On 3/15/16 2:12 PM, Daniel Dekany wrote:
>> Hi,
>> FreeMarker 2.3.24 (not yet released, though it meant to be ASAP) adds
>> a simple auto-escaping feature. It's not context sensitive. It's
>> pretty much just a better version of #escape. I mention it as it might
>> interferes on some ways with your work. This is the guide for it:
>> Note that we have TemplateMarkupOutputModel objects on runtime, which
>> prevents double-escaping. As you track data flow on runtime, maybe
>> that's something that you can utilize. If so, I can put 2.3.24 on hold
>> and see how it could be improved to serve that goal.
>> As of having different branches of 2.x, I haven't seen how different
>> your branch is, but as currently FM development resources are very
>> scarce, it's perhaps not a realistic to maintain an additional branch.
>> But, there's a plan for starting a FreeMarker 3 branch, whose purpose
>> is getting rid of the backward compatibility baggage piled up in the
>> last ~12 years, and fixing some known design issues, be ignoring
>> backward compatibility. (If it will happen, it will be in a separate
>> package, so it can be used next to FreeMarker 2.x, which will pretty
>> much switch to maintenance mode.) FreeMarker 3 is also a good chance
>> to put in architectural changes upon which more ambitious features can
>> be built, such as more intelligent white-space handling, or maybe
>> context sensitive escaping too. So I'm interested in what's needed for
>> context sensitive escaping to work.
>> Tuesday, March 15, 2016, 4:03:57 PM, Matthew Van Gundy wrote:
>>> Hi dev@freemarker,
>>> Cross-Site Scripting (XSS) vulnerabilities are embarrassingly common in
>>> web applications.  A few of us in the Cisco Advanced Security
>>> Initiatives Group have extended FreeMarker to support context-sensitive
>>> auto-sanitization in order to automatically detect and prevent
>>> Cross-Site Scripting vulnerabilities.  We would like to contribute our
>>> improvements back to the open-source community.  As part of that
>>> process, we'd like to solicit your feedback about whether and under what
>>> conditions FreeMarker might be willing to integrate our changes or if it
>>> would be better to maintain a fork that attempts to retain compatibility
>>> and feature parity with FreeMarker.
>>> Our modified FreeMarker:
>>> * Precisely tracks the flow of untrusted data through the FreeMarker
>>>   engine
>>> * Tracks the current parsing context of an HTML5-compliant browser
>>>   parser
>>> * Automatically sanitizes untrusted data correctly for the enclosing
>>>   HTML/JavaScript/CSS context where the data is to be embedded
>>> * Detects manually-sanitized data and avoids over-sanitization as long
>>>   as the manually-applied sanitization is safe for the enclosing web
>>>   page context
>>> For example, given the template:
>>> Hello ${email}!
>>> <button onclick="signin('${email}')">
>>> When rendered with email = "<script>pwn()</script>" the result will
>>> Hello &lt;script&rt;pwn()&lt;/script&rt;!
>>> <button onclick="signin('&lt;script&rt;pwn()&lt;/script&rt;')">
>>> When rendered with email = "'+ pwn() +'" the result will be:
>>> Hello '+ pwn() +'!
>>> <button onclick="signin('\'+ pwn() +\'')">
>>> In both cases, whenever untrusted data is embedded into a web page, it
>>> is sanitized safely and correctly for the enclosing context.  Our
>>> approach was inspired by Closure Templates [1] and Samuel, Saxena, and
>>> Song's CCS 2011 paper on Context-Sensitive Auto-Sanitization [2] with
>>> the caveat that our system does not perform static analysis, it is
>>> runtime only.
>>> As a runtime-only approach, there is a performance impact.  In our
>>> current benchmarks, data flow tracking alone incurs a 1.6x slowdown
>>> while full context-sensitive auto-sanitization (including data flow
>>> tracking) incurs a 2x slowdown.
>>> Precise data flow tracking requires all string operations in the
>>> FreeMarker engine to be instrumented.  It is not possible to allow data
>>> flow tracking to be enabled/disabled on demand without extensive
>>> refactoring.  As a result, the 1.6x slowdown due to data flow tracking
>>> is always present in the current implementation.  To mitigate this for
>>> applications that cannot incur a 1.6x slowdown in template rendering, we
>>> provide an API-compatible replacement for freemarker.jar which can be
>>> used during testing.  In this use case, our system provides a
>>> verification that manual sanitization has been applied correctly for all
>>> test cases seen during the normal dev-test cycle.
>>> We would like your opinions on whether and under what constraints,
>>> FreeMarker would be interested in adopting these changes.  Would
>>> FreeMarker be interested in offering a context-sensitive auto-sanitizing
>>> branch which maintains feature parity with the main release but provides
>>> additional safety for a performance cost?  If extensive refactoring
>>> could allow data flow tracking and auto-sanitization could be
>>> selectively enabled, would FreeMarker be willing to integrate those
>>> changes?  Or, would releasing a separate project be the preferred route?
>>> Thank you for all of your hard work creating an excellent tool.  We
>>> appreciate any feedback you may have.
>>> Matt
>>> [1]
>>> [2]

View raw message