webhelpers.html.builder
¶
HTML/XHTML tag builder
HTML Builder provides:
- an
HTML
object that creates (X)HTML tags in a Pythonic way. - a
literal
class used to mark strings containing intentional HTML markup. - a smart
escape()
function that preserves literals but escapes other strings that may accidentally contain markup characters (“<”, “>”, “&”, ‘”’, “’”) or malicious Javascript tags. Escaped strings are returned as literals to prevent them from being double-escaped later.
literal
is a subclass of unicode
, so it works with all string methods
and expressions. The only thing special about it is the .__html__
method,
which returns the string itself. The escape()
function follows a simple
protocol: if the object has an .__html__
method, it calls that rather than
.__str__
to get the HTML representation. Third-party libraries that do not
want to import literal
(and this create a dependency on WebHelpers) can put
an .__html__
method in their own classes returning the desired HTML
representation.
WebHelpers 1.2 uses MarkupSafe, a package which provides an enhanced
implementation of this protocol. Mako and Pylons have also switched to
MarkupSafe. Its advantages are a C speedup for escaping,
escaping single-quotes for security, and adding new methods to
literal
. literal is now a subclass of markupsafe.Markup
.
escape is markupsafe.escape_silent
. (The latter does not exist yet in
MarkupSafe 0.9.3, but WebHelpers itself converts None to “” in the meantime).
Single-quote escaping affects HTML attributes that are written like this: alt=’Some text.’ rather than the normal alt=”Some text.” If the text is a replaceable parameter whose value contains a single quote, the browser would think the value ends earlier than it does, thus enabling a potential cross-site scripting (XSS) attack. WebHelpers 1.0 and earlier escaped double quotes but not single quotes. MarkupSafe escapes both double and single quotes, preventing this sort of attack.
MarkupSafe has some slight differences which should not cause compatibility
issues but may in the following edge cases. (A) The force
argument to
escape()
is gone. We doubt it was ever used. (B) The default encoding of
literal()
is “ascii” instead of “utf-8”. (C) Double quotes are escaped as
“"” instead of “"”. Single quotes are escaped as “'”.
When literal
is used in a mixed expression containing both literals and
ordinary strings, it tries hard to escape the strings and return a literal.
However, this depends on which value has “control” of the expression.
literal
seems to be able to take control with all combinations of the +
operator, but with %
and join
it must be on the left side of the
expression. So these all work:
"A" + literal("B")
literal(", ").join(["A", literal("B")])
literal("%s %s") % (16, literal("kg"))
But these return an ordinary string which is prone to double-escaping later:
"\n".join([literal('<span class="foo">Foo!</span>'), literal('Bar!')])
"%s %s" % (literal("16"), literal("<em>kg</em>"))
Third-party libraries that don’t want to import literal
and thus avoid a
dependency on WebHelpers can add an .__html__
method to any class, which
can return the same as .__str__
or something else. escape()
trusts the
HTML method and does not escape the return value. So only strings that lack
an .__html__
method will be escaped.
The HTML
object has the following methods for tag building:
HTML(*strings)
Escape the string args, concatenate them, and return a literal. This is the same as
escape(s)
but accepts multiple strings. Multiple args are useful when mixing child tags with text, such as:html = HTML("The king is a >>", HTML.strong("fink"), "<<!")
HTML.literal(*strings)
- Same as
literal
but concatenates multiple arguments. HTML.comment(*strings)
- Escape and concatenate the strings, and wrap the result in an HTML comment.
HTML.tag(tag, *content, **attrs)
Create an HTML tag
tag
with the keyword args converted to attributes. The other positional args become the content for the tag, and are escaped and concatenated. If an attribute name conflicts with a Python keyword (notably “class”), append an underscore. If an attribute value isNone
, the attribute is not inserted. Two special keyword args are recognized:c
- Specifies the content. This cannot be combined with content in positional args. The purpose of this argument is to position the content at the end of the argument list to match the native HTML syntax more closely. Its use is entirely optional. The value can be a string, a tuple, or a tag.
_closed
- If present and false, do not close the tag. Otherwise the tag will be closed with a closing tag or an XHTML-style trailing slash as described below.
_nl
- If present and true, insert a newline before the first content element, between each content element, and at the end of the tag.
Example:
>>> HTML.tag("a", href="http://www.yahoo.com", name=None, ... c="Click Here") literal(u'<a href="http://www.yahoo.com">Click Here</a>')
HTML.__getattr__
Same as
HTML.tag
but using attribute access. Example:>>> HTML.a("Foo", href="http://example.com/", class_="important") literal(u'<a class="important" href="http://example.com/">Foo</a>')
HTML.cdata
Wrap the text in a “<![CDATA[ ... ]]>” section. Plain strings will not be escaped because CDATA itself is an escaping syntax.
>>> HTML.cdata(u"Foo") literal(u'<![CDATA[Foo]]>')
>>> HTML.cdata(u"<p>") literal(u'<![CDATA[<p>]]>')
About XHTML and HTML¶
This builder always produces tags that are valid as both HTML and XHTML.
“Void” tags – those which can never have content like <br>
and <input>
– are written like <br />
, with a space and a trailing /
.
Only void tags get this treatment. The library will never, for
example, produce <script src="..." />
, which is invalid HTML. Instead
it will produce <script src="..."></script>
.
The W3C HTML validator validates these
constructs as valid HTML Strict. It does produce warnings, but those
warnings warn about the ambiguity if this same XML-style self-closing
tags are used for HTML elements that are allowed to take content (<script>
,
<textarea>
, etc). This library never produces markup like that.
Rather than add options to generate different kinds of behavior, we felt it was better to create markup that could be used in different contexts without any real problems and without the overhead of passing options around or maintaining different contexts, where you’d have to keep track of whether markup is being rendered in an HTML or XHTML context.
If you _really_ want tags without training slashes (e.g., <br>`)`, you can
abuse ``_closed=False
to produce them.
Classes¶
-
class
webhelpers.html.builder.
literal
(s, encoding=None, errors=strict')¶ Represents an HTML literal.
This subclass of unicode has a
.__html__()
method that is detected by theescape()
function.Also, if you add another string to this string, the other string will be quoted and you will get back another literal object. Also
literal(...) % obj
will quote any value(s) fromobj
. If you do something likeliteral(...) + literal(...)
, neither string will be changed becauseescape(literal(...))
doesn’t change the original literal.Changed in WebHelpers 1.2: the implementation is now now a subclass of
markupsafe.Markup
. This brings some new methods:.escape
(class method),.unescape
, and.striptags
.-
classmethod
escape
(s)¶ Same as the
escape
function but return the proper subclass in subclasses.
-
unescape
()¶ Unescape markup again into an text_type string. This also resolves known HTML4 and XHTML entities:
>>> Markup("Main » <em>About</em>").unescape() u'Main \xbb <em>About</em>'
Unescape markup into an text_type string and strip all tags. This also resolves known HTML4 and XHTML entities. Whitespace is normalized to one:
>>> Markup("Main » <em>About</em>").striptags() u'Main \xbb About'
-
classmethod
-
class
webhelpers.html.builder.
HTML
¶ Described above.
Functions¶
-
webhelpers.html.builder.
lit_sub
(*args, **kw)¶ Literal-safe version of re.sub. If the string to be operated on is a literal, return a literal result. All arguments are passed directly to
re.sub
.
-
webhelpers.html.builder.
url_escape
(s, safe='/')¶ Urlencode the path portion of a URL. This is the same function as
urllib.quote
in the Python standard library. It’s exported here with a name that’s easier to remember.
The markupsafe
package has a function soft_unicode
which converts a
string to Unicode if it’s not already. Unlike the Python builtin unicode()
,
it will not convert Markup
(literal
) to plain Unicode, to avoid
overescaping. This is not included in WebHelpers but you may find it useful.