Producing HTML using string templates has always been the wrong solution
There are broadly two approaches used to generate (X)HTML in web applications today:
- string templating systems;
- AST-based representation, transformation and serialization;
Historically, the majority of server-side web applications have tended to produce HTML output using string templating systems. String template approaches remain vastly more popular than AST-based approaches.
This is unfortunate, as I've come to the conclusion for some years now that the use of string templating systems to generate HTML is fundamentally the wrong approach.
Issues with use of string templating. The use of string templating systems to produce HTML is the reason why XSS is the issue it is today in web applications. The problem is that string templating systems don't have any comprehension of what they are templating or the semantics of the output being generated. Attempting to use a string templating system to generate a language with formally defined syntax is fundamentally hazardous as a string template language doesn't know how to represent a template variable in the target language, and, due to the frequent need for escaping, a naive representation may be wrong.
There have been attempts to overcome these shortcomings of string templating systems via adding features like autoescaping. However, these features have the shortcoming that often the kind of escaping needed is context-dependent. For example, the escaping required in an HTML attribute value is different from the escaping required in an HTML element body, which is in itself different from the escaping required for a query string parameter value inside an URL inside an HTML attribute value. Thus, all autoescaping systems are forced to choose between two possible approaches:
The first is to simply escape everything the same way; for example,
"
would be escaped as"
both in HTML attribute values and in element body text even though it only needs to be escaped in the former case.However, this cannot escape correctly in some contexts, like URLs or JavaScript string constants. Thus, the developer has to remember to explicitly request a special type of escaping in this case. This defeats the point of autoescaping, which is after all supposed to be automatic.
The second is to try and determine the context of usage of the template variable with regard to the semantics of the “string” being templated. This requires the “string” templating system to try and interpret the format of the document being templated. For example, when one of these templating systems sees the string
<script>
, it may assume it is now in JavaScript and start escaping template variables accordingly.The problem with this approach is that it's a rampant layering violation. Our string template system is now trying to interpret the text we're templating, but not only that, it's silently making decisions based on its personal interpretation of what it sees.
For example, suppose the system is designed to do URL query string parameter escaping automatically. In other words, if we write the following code in our template, the
value
parameter will be automatically URL encoded:<a href="?a=foo&value={value}">...</a> <-- safe -->
However, this only happens if the templating system knows that the
href
attribute on ana
tag contains an URL. If we use an URL in some other context, the system doesn't know it's an URL and URL escaping is omitted:<a data-foobar="?a=foo&value={value}">...</a> <!-- UNSAFE -->
Not only that, this omission of URL escaping is completely silent and happens without warning. Fundamentally, the conceit of context-dependent autoescaping systems is that they can have a full understanding of the semantics of the language they are templating. However, this is not possible because languages such as (X)HTML are user-extensible.
Even if they were not, there is still the risk of different interpretations of standard constructs. For example, a web browser parsing HTML will recognise both
<script>
and<SCRIPT>
as starting a JavaScript block, but will your template system? Does your template system recognise both<script type="text/javascript">
and<script type="application/javascript">
? What about<script type="application/javascript; charset=utf-8">
? What about<h:script xmlns:h="http://www.w3.org/1999/xhtml">
? Hopefully the answer is “yes”, but it's very easy for a context-sensitive autoescaper's analysis to differ from that of a web browser. What makes this fatal is that when the analysis is incorrect, no warnings are given. You don't know that your web application isn't escaping input correctly until someone actually discovers the vulnerability (and either reports or exploits it).
Not only can string templating systems not guarantee the correct serialization of single template inputs (like strings), they cannot either guarantee that the output will be syntactically valid according to the rules of the target language. For example, a string templating language would have no way to prevent the following error:
<div>
Hello, <{name}>.
</span>
Whitespace. String templating systems also have issues with whitespace
management. Most string templating systems used with (X)HTML tend to result in
the generation of needless whitespace, which mostly is a waste of bandwidth,
but can also occasionally have material impact on a page. Many string
templating systems necessarily end up including special tags to suppress errant
whitespace that would otherwise be added, like ERB's -%>
. Most AST-based
designs avoid these issues around the unwitting and pervasive introduction of
whitespace.
The AST approach
An alternative to the use of string templating systems for generating (X)HTML is to represent the desired template as a formally parsed structure represented as an AST. In this design, because the templating system parses and understands the target language, a syntactically invalid template will not parse, and conversely, because the output is serialized from an AST representation, the output of syntactically invalid code is impossible, as all admitted states of the AST representation correspond to a syntactically valid output:
<div>
Hello, <{name}>.
</span> <!-- template does not parse -->
Unfortunately, string templating systems remain in wide use for the generation of HTML, especially server-side HTML. The continued use of string templating could be considered surprising when compared with the culture of use of SQL. When knowledge of the issue of SQL injection became widespread, many developers adopted crude initial solutions involving manual string templating and escaping:
query("SELECT * FROM users WHERE name='" . sql_escape($name) . '"');
Note how this is exactly analogous to the use of string templating for HTML, and suffers from the same issues: manual escaping is an accident waiting to happen, and autoescaping requires language- and context-dependent analysis of the string being templated, which is liable to be unreliable.
However, modern practice in web development has actually improved in this area and prepared statements are now preferred:
query("SELECT * FROM users WHERE name=?", [$name]);
This approach is somewhat halfway between string templating and AST-based solutions; the SQL query is still represented as a string, so the developer could still write syntactically invalid SQL. However, the parameterisation of the query is expressed in a fully abstract manner. While an SQL database driver could generate a SQL statement behind the scenes by moving the arguments into to the SQL query string using escaping, in practice well-designed SQL database protocols transport the arguments separately in a way that avoids the need for escaping. Thus, in this case, no escaping actually needs to be performed at any point, neither in the application nor in the SQL driver.
A full adoption of the principles of AST serialization in the field of SQL in the same manner as advocated by this article for (X)HTML would require the serialization of the SQL query itself from an AST. This is actually not too uncommon with some kind of builder-style interface, which might look something like:
select('users').where(name='Somebody')
Other examples of this kind of approach include C#'s LINQ. Many ORM systems, though usually abstracting users from the need to write SQL, also have a lower layer used to generate SQL in this way.
Recent improvements: JSX. On the other hand, the trend has actually been changing for the better in recent years. The popularity of frameworks like React has seen a resurgence of interest in the idea of writing XML directly inside a programming language, in a way integrated with the language's parser. For React this is called JSX, and is supported by TypeScript, but the idea is not actually a new one; there used to be a very similar extension to ECMAScript called E4X, which you can read about here.
JSX looks like this:
function render(title) {
return <html>
<head>
<title>{title}</title>
</head>
</html>;
}
E4X looks like this:
function render(title) {
return <html>
<head>
<title>{title}</title>
</head>
</html>;
}
Yes, that's right, they're identical. Though of course, the semantics of how these result in the construction of an AST differ.
In the JSX case, the XML is parsed and must be syntactically valid, well-formed XML. The XML AST is then translated to a series of nested JavaScript function calls. How these calls are implemented is up to you; they could trivially construct an AST, or do immediate rendering of some kind. However, the important distinction is that it is literally impossible to write JSX templates which are not well-formed XML.
Caveats around embedded languages. The only real limitation of the AST-based approach is that some strings in an AST may embed other languages. This is the same circumstance where the previously mentioned context-sensing autoescaping approaches can easily fall down.
The solution is to not use strings in this case but instead an AST-based representation of the embedded language. Though there is a risk of someone erroneously using a string, the same can be said of SQL prepared statements: someone could manually construct a SQL statement using string operations, this does not diminish the security advantages of the technique when correctly applied. Moreover, the fact that one has to resort to using string templating to undermine correctness in this case rather proves the point; AST-based approaches are preferable to use of string templating for serializing any given language.
The risks here can be mitigated effectively as follows:
A well designed system will provide ergonomic ways to specify embedded language ASTs, reducing the temptation for developers to try and use string templating. Once the use of these facilities becomes commonplace, it becomes a cultural norm and the use of string templating is likely to stand out like a sore thumb.
The AST serializer could also optionally be designed to refuse to accept strings as input in certain contexts, such as for an HTML attribute known to always accept an URL, for example, and instead require an URL AST object be passed, to discourage abuse.
Since this is based on teaching the AST serializer about specific semantics of the output schema (e.g. XHTML) it has the same shortcomings as context-sensing autoescaping in that it impairs the separation of concerns, and moreover cannot be relied upon. However, in the context-sensing autoescaping case, autoescaping either silently succeeds, or silently fails and outputs improperly escaped text. In the AST-based approach, where the need for an AST rather than string input is correctly identified by the serializer, an error is thrown, training the developer in proper security practices. A JSX example might look like:
var url = '?a=1&b=2'; return <a href={url}>Click here</a>; // throws SemanticInputRequiredError("a @href must be an URL, not a string") var url = URL.makeQuery({a: 1, b: 2}); // URL object, serializes to ?a=1&b=2 return <a href={url}>Click here</a>; // OK
Recent improvements: URLs. Though to my recollection originally popularised
by Ruby on Rails's url_for
helper function, there are now many frameworks
which encourage people to describe the URLs they want using some kind of
abstract representation rather than manually construct them using string
operations; in fact this can now considered a standard feature of any decent
web framework.
Other AST-based systems: Lisp and SXML. Another system for the AST-based generation of XHTML output is SXML, the use of S-expressions to represent XML in Lisp environments. SXML looks like this:
(div (@ (class "foo") (id "bar"))
(p "This is a paragraph."))
Because SXML is written using the same S-expression syntax used to write the rest of the Lisp language, it can be written directly inside the language environment just like JSX allows writing XML directly inside a JavaScript environment; however, unlike JSX, it does not require any extension of the language.
SXML turns out to be a very pleasant way to write with XML and ironically is probably nicer to write than XML itself. It also makes it very easy to transform XML arbitrarily, and is probably a lot more pleasant to use in the average case than say, XSLT. You can use pattern matching and functional programming approaches to match and transform XML documents, represented as SXML, arbitrarily.
SXML, like JSX, makes writing invalid XML impossible. Since it semantically represents the data to be output, escaping is an irrelevant concern as it is handled automatically by the AST serializer; it is not actually possible to bypass escaping.
Ordinarily when we write S-expressions, they're executed as Lisp code. Lisp allows us to use quoting to write S-expressions which will instead be constructed as data. Most importantly of all, we can then escape from that quoting to execute code again, allowing us to insert data into our “template”:
`(div (@ (class "foo") (id "bar"))
(p "My name is " ,(user-name user) "."))
We can execute arbitrary expressions to generate SXML much as we can in JSX. Also, since all text must be explicitly quoted in SXML, there are no issues with accidental introduction of whitespace.
Other AST-based systems: Haml. While there has been positive movement around the
adoption of AST-based approaches over string templating due to the rise of JSX,
most server-side web development ecosystems remain unfortunately attached to
string templating. This includes PHP (where popular templating languages
include the string-based Twig), Go (commonly used with the html/template
standard library package, which uses context-sensitive autoescaping), and Ruby
on Rails, where the default Embedded Ruby templating language is also based on
string templating with autoescaping.
Though it's now quite a while ago, for a time a language called Haml was seeing some usage with Rails. Haml is essentially a concise way to write (X)HTML which is presumably inspired by YAML:
%html
%head
%title @page_title || "Default title"
%body
- if @page_title
%h1
= @page_title
%a(href="#")
This is a link.
Though the primary motivation behind this approach is clearly to have a syntax which is more concise, it does have the inadvertent advantage that it's an AST-based system rather than a system based on string templating. Indentation is used to describe the (X)HTML AST, and as such it is not possible to express invalid XML. Thus, for this reason Haml seems to me a strictly superior approach to templating than Rails's default ERB. Sadly, Haml seems to have now fallen out of favour.
Other AST-based systems: inline DSLs. Most programming languages aren't as flexible in their syntax as the Lisp family, but some are still flexible enough to allow the passable expression of XML using their existing syntax. A Python templating library called Stan exists, which has you write in Python:
document = tags.html[
tags.head[
tags.title["Page title"]
],
tags.body[
tags.h1["Page title"],
tags.p["This is a paragraph."],
tags.div(class_="foo")[
"..."
]
]
]
Of course, Ruby is more suited to crafting inline DSLs, so there are a fair few options for Ruby, most of which look something like:
div.some_class, id: "some_id" do
h1 "Title"
p "This is a paragraph."
ul do
li "List item"
end
end
Adoption
Reasons for lack of adoption. As I mentioned, the fact that historically string templating has been more popular to generate (X)HTML than AST-based approaches seems to basically come down to:
Constructing XML ergonomically inside a programming language can require language extensions to be ergonomic (e.g. JSX, which has only become available relatively recently).
Storing AST-based templates outside of your programming language environment requires you do one of the following:
have the template language for control flow, etc. be an XML-based language interleaved with the XML you are templating; however, XML turns out to not be a very compelling language for describing control flow (see below);
extend the XML language itself by using some non-XML-based language superimposed over XML, and write your own XML parser(!) which understands these extensions;
use XML processing instructions ala PHP (
<?foo ...?>
); however as these are not structured and can occur independently of the XML hierarchy, they don't inherently guarantee that your control flow is correctly nested with the XML syntax, and careless design could allow generation of malformed XML output. Of course, PHP itself is an example of this.extend the XML language itself by giving special meaning to some syntaxes of XML text nodes (e.g.
<div>My name is ${name}.</div>
). This approach seems relatively popular, but has the disadvantage of creating another syntax which needs escaping, and doesn't really feel like the correct application of XML.
Nobody wants to write XSLT, the “official” XML templating language.
More generally, AST-based templating systems for (X)HTML tend to rely on using XML-based constructs to express control flow, but it isn't very suited to it.
The XSLT “else” problem provides a random example of how unergonomic things can get:
<xsl:if test="..."> ...if true... </xsl:if> <!-- Can't add an else --> <!-- Instead, do: --> <xsl:choose> <xsl:when test="..."> ...if true... </xsl:when> <xsl:otherwise> ...if false... </xsl:otherwise> </xsl:choose>
Many AST-based templating systems were (to my distant recollection) much slower than their string templating-based counterparts. However, it's important to note that there's really no reason why this needs to be the case; AST-based approaches can be as performant as string templating approaches because the AST-based approach can be statically compiled to a string template internally.
For example, finding a lack of good options, I wrote an XML/XHTML templating language for PHP which is similar to XSLT in that it uses XML-based annotations for control flow, but is designed to be a bit simpler and more pleasant to write. However, these templates are then compiled to PHP files which comprise sequences of echo statements just like those produced by a typical string template-based PHP templating library. The template and control flow are expressed in the form of a well-formed XML document and at a semantic level, but they can be statically analysed and compiled to imperative output of strings. For example, consider the following template in my (unpublished) XML templating language:
<q:foreach v="m" x="flash.messages"> <div> <q:expr x="m.text"/> </div> </q:foreach>
This is translated to the following compiled PHP template:
foreach ($this->_getAttribute($ctx['flash'], 'messages') as $_itv2) { $ctx['m'] = $_itv2; echo '<div>'; echo self::_escape($this->_getAttribute($ctx['m']), 'text'); echo '</div>'; }
Recommendations
In short:
AST-based generation is conceptually the right way to generate dynamic output and is more secure. Use it wherever possible.
Where string-based templating languages are used, autoescaping should be considered an absolute minimum requirement.
Templating languages which can't support autoescaping simply should not be used under any circumstances.
(Somewhat ironically this means that PHP should be used for literally anything but its originally conceived purpose as a templating language, as it has no autoescaping facilities; nowadays PHP developers usually use template languages written in PHP, rather than use PHP itself as one.)
If you use an autoescaping string template language, you need to think carefully about how its autoescaping works. At a minimum, such languages can be fully trusted to escape text inside elements and attribute values correctly.
However, context-sensing autoescaping template languages may also variously promise to escape variables contained in URLs, CSS and JavaScript, and these claims must be very carefully considered:
Regardless of any claims made, a string template language should not be trusted to escape URL parameters correctly, as this depends on the autoescaping system correctly identifying an attribute as containing an URL. Though really, you should probably not be hardcoding URL structure in your templates anyway, but generating them elsewhere.
In a similar fashion, it seems extremely inadvisable to rely on the autoescaping of string constants inside inline JavaScript. However, I can't really conceive of any valid reason to want to do this nowadays anyway; inline JavaScript is usually not a good idea, and even if it is used, any dynamic data you want to feed it should probably be serialized as JSON and loaded by the inline script, rather than templated into the script itself. This fully eliminates any relevant hazards. (Notice how the JSON serialization is another instance of using an AST instead of string templating.)
Don't rely on CSS to be autoescaped either. Mostly you will not need to worry about CSS, as most CSS you use will be compiled statically at built time (if you are even using a CSS toolchain). But where occasional circumstances do arise requiring dynamic inline CSS, explicitly escape it and confirm that it's being escaped correctly.
Other thoughts
Syntax refuseniks. I previously noted in my article on XHTML2 that one of the apparent objections supposedly raised by web developers against XHTML that having to write well-formed XML was somehow unreasonably burdensome.
This is frankly completely absurd — and really says all you need to know about the webdev industry. To see how ludicrous this is, consider how one would be received if one made this argument for literally any other language. If a C developer complained that it was unreasonable to expect them to write syntactically valid C, one would expect them to be laughed out of the room. A proposal to change Rust so that braces don't have to be balanced, because having to match every brace correctly is just too difficult, would be met with equal derision. Yet for some reason HTML people were happy to claim that having to close tags in the order they were opened, amongst other things, was somehow utterly untenable.
This is ridiculous, but it also demonstrates how poor the practices were at the time, and in many ways, still are. It's quite likely that the use of string templating systems was one of the major factors in the unfortunate failure of XHTML to gain mass adoption; in short, because string templating systems, applied to the generation of HTML, can't systematically catch and prevent the generation of malformed HTML, leading to errors creeping in.
Thus by the time XHTML arrived, the typical web application was probably already full of mismatched HTML tags and other errors which the webapp developer's templating practices had systematically failed to catch and prevent. All of this originates from what is essentially the original sin of taking string templating systems and using them to crudely assemble a well-defined structured document language.
Other discussions, header/footer problem. A very interesting discussion of
context-aware autoescaping can be found
here.
This design endeavours to support autoescaping of URL parameters,
JavaScript and CSS in addition to (X)HTML; it also appears to have ended up
serving as inspiration for Go's
html/template
, which is a context-sensing
autoescaping string templating library. This discussion refers to what I refer
to as an AST-based design as “strict structural containment”, and notes both
its advantages and aspects hindering its use:
“Strict structural containment is a sound, principled approach to building safe templates that is a great approach for anyone planning a new template language, but it cannot be bolted onto existing languages though because it requires that every element and attribute start and end in the same template. This assumption is violated by several very common idioms, such as the header-footer idiom in ways that often require drastic changes to repair.”
Essentially, many web applications using string-based templating started out
needing to have a common layout for each page. However, the most rudimentary
solution to this problem is to have separate “header” and “footer” templates,
between which some page-specific template is invoked. However, this cannot be
represented with an AST-based approach, because separating the template which
contains <html>
from the template which contains </html>
is (very
intentionally) not possible.
Of course, the header/footer approach is just plain not very good anyway. It's a crude solution. Far better are solutions which have a single layout with “blocks” which can be filled in by other templates which extend that template. This approach is more powerful because you can have multiple different blocks in the master layout which can be overridden by a page template, rather than just one body area coming between a header and footer. This approach to templating is now much more common even among string templating languages, which often offer systems for inheritance relationships between templates; thus, these obstacles to adoption no longer exist as much as they once did.
It's interesting to note that when using a programming language which supports the ergonomic expression of XML (such as JavaScript with JSX or Lisp with SXML), one can trivially get this kind of functionality simply by using the ordinary mechanisms for procedural decomposition provided by the language:
function layout(args) {
return <html>
<head>
<title>{args.title || 'Default title'}</title>
{args.extraHead}
</head>
<body>
<nav>{args.nav}</nav>
<main>
{args.content}
</main>
</body>
</html>;
}
function index() {
return layout({
title: 'Index page',
content: <p>This is the index page.</p>,
});
}
By comparison, the string templating languages which provide inheritance and block override mechanisms have to implement this functionality deliberately. In other words, the string templating languages end up having to extend themselves to duplicate the very functionality of the languages they are built on. By comparison, the language-inline AST-based approach doesn't have any such need to reinvent the wheel.
Taxonomy of methods
Based on the discussion so far, we can construct a taxonomy of the various methods:
- String templating-based approaches
-
- Pure string templating
- Refers to blind string templating systems which have no knowledge of (X)HTML specifically.
- Autoescaping augmented HTML string templating
Refers to string templating systems which have been augmented with some kind of knowledge of (X)HTML to facilitate autoescaping, but which remain string templating systems in their essential operation.
- Blind autoescaping HTML string templating
- A string templating system which has been augmented to facilitate blind autoescaping. This mechanism cannot detect the context in which a variable is interpolated, and so always autoescapes the same way, regardless of context. Examples are too numerous to list.
- Context-aware autoescaping HTML string templating
- A string templating system which has been augmented to facilitate context-aware autoescaping. The context in which a variable is being interpolated can be detected and the autoescaping functionality adjusted appropriately, though how reliable this is may vary. Examples include Go's html/template.
- AST-based approaches
-
- XML-based templating languages
These define an XML schema which will typically be interwoven with another XML schema such as XHTML. They define control flow and interpolation constructs to fulfil templating requirements.
They can be subcategorised into three classes in turn:
- Pure XML-based
- These languages use only XML elements, attributes and processing instructions to control templating. The meaning of XML text nodes is not overloaded. Examples include XSLT. A language in this class might interpolate a variable using syntax like
<tpl:value expr="varName"/>
. - Overloaded XML-based
- These languages overload XML text nodes to add their own language interspersed around XML syntactic elements. This has the disadvantage of creating a new syntax that needs to be escaped. This is only in templates rather than (X)HTML, so it is only a problem if you are generating templates themselves from untrusted user input. A language in this class might interpolate a variable using syntax like
${varName}
. - Hybrid XML-based
- A hybrid XML-based language uses XML-based syntax for control flow, but overloaded text nodes for expression interpolation. Examples include Genshi (Python), the template language used by the Trac project management system.
- Language extensions for inline XML
These are language extensions which allow XML to be written inline inside the language. Examples include JSX and the older E4X, both for JavaScript. Facebook at some point developed a similar extension for PHP called XHP.
- Using a programming language as a DSL
These use an existing programming language without any special extention to express XML structure concisely. Examples include SXML as used by Lisp/Scheme and Stan for Python.
- DSLs
These are DSLs implemented fully independently. Examples include Haml and Slim (both Ruby) and Hamlet (used by Haskell's Yesod web framework). A large number of variant languages inspired by Haml also exist.
Perspectives on security
I always like to read about major safety incidents in the aviation industry. One of the inspiring things about aviation safety is the level of effort put in to what can be described as “problem class elimination”. In other words, when a serious accident occurs (or even just almost occurs), the industry doesn't just say “don't do that again” but tries to find ways to systematically eliminate the entire class of problem which has been discovered. Pilots flying into terrain was a surprisingly common issue, thus the Ground Proximity Warning System was created.
An important dimension of this work is that it's never based on assumptions of superhuman performance from the human components. Every component in the plane is reasoned about in terms of its reliability, and its failure modes. The human operators are components of that system; they have a certain level of reliability, and they have known failure modes. There have been a lot of accidents in the aviation, nuclear, chemical, etc. industries, and that gives us a lot of understanding of how humans go wrong. Blaming the human operator is almost never constructive; better to design systems under the knowledge of their limitations and their most common failure modes.
Nowadays I think an appreciation of this point is pervasive in the software industry, which is good. You probably haven't heard the words “problem class elimination” before, but you probably have heard of the Rust programming language, and the borderline obsession its proponents have with something called “memory safety”. This is a technology intended to eliminate an entire problem class more or less for good. Rust users believe in the Rust language because they believe in the power of problem-class elimination to fundamentally change the game as far as security is concerned.
You may recognise many existing problem class elimination techniques, many of which have managed to (eventually) gain traction:
Problem | Problem Class Eliminating Technology |
---|---|
SQL injection |
|
XSS |
|
CSRF |
|
Buffer overflow, UAF |
|
Where a viable method of problem class elimination is identified, we should adopt it. There's a very large amount of inertia behind the use of string templating, but the rising popularity of technologies like JSX may finally be shifting this trend. Using string templating systems for generating HTML may always have been a mistake, but it's a mistake that increasingly doesn't need to be carried forward into greenfield development.
2023-05-26: Chris Siebenmann has a different view which you may find interesting. Some more of my own thoughts.