Skip to content

Commit 320ef39

Browse files
committed
Recommend JustHTML and customize comments for each function/method.
1 parent 41b0573 commit 320ef39

1 file changed

Lines changed: 60 additions & 48 deletions

File tree

docs/reference.md

Lines changed: 60 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -36,22 +36,31 @@ method appropriately ([see below](#convert)).
3636
sanitization in popular software] for notes on best practices to ensure
3737
HTML is properly sanitized.
3838

39-
The developers of Python-Markdown recommend using [`nh3`][nh3] or [`bleach`][bleach][^1]
40-
as a sanitizer on the output of `markdown.markdown`. However, be
41-
aware that those libraries may not be sufficient in themselves and will
42-
likely require customization. Some useful lists of allowed tags and
43-
attributes can be found in the [`bleach-allowlist`][bleach-allowlist] library, which should
39+
The developers of Python-Markdown recommend using [JustHTML] as a
40+
sanitizer on the output of `markdown.markdown`. JustHTML includes a
41+
built-in HTML sanitizer. When you pass the HTML output through JustHTML
42+
(`JustHTML(markdown.markdown(text), fragment=True).to_html())`), it
43+
is sanitized by default according to a strict [allow list policy]. The
44+
policy can be [customized] if necessary.
45+
46+
If you cannot use JustHTML for some reason, some alternatives include
47+
[`nh3`][nh3] or [`bleach`][bleach][^1]. However, be aware that those
48+
libraries will not be sufficient in themselves and will require
49+
customization. Some useful lists of allowed tags and attributes can be
50+
found in the [`bleach-allowlist`][bleach-allowlist] library, which should
4451
work with either sanitizer.
4552

4653

4754
[Markdown and XSS]: https://michelf.ca/blog/2010/markdown-and-xss/
4855
[Improper markup sanitization in popular software]: https://github.com/ChALkeR/notes/blob/master/Improper-markup-sanitization.md
56+
[JustHTML]: https://emilstenstrom.github.io/justhtml/
57+
[allow list policy]: https://emilstenstrom.github.io/justhtml/html-cleaning.html#default-sanitization-policy
58+
[customized]: https://emilstenstrom.github.io/justhtml/html-cleaning.html#use-a-custom-sanitization-policy
4959
[nh3]: https://nh3.readthedocs.io/en/latest/
5060
[bleach]: http://bleach.readthedocs.org/en/latest/
5161
[bleach-allowlist]: https://github.com/yourcelf/bleach-allowlist
52-
[^1]: We are aware that the [bleach] project has been [deprecated](https://github.com/mozilla/bleach/issues/698).
53-
However, it is the only pure-Python HTML sanitation library we are aware of and may be the only option for
54-
those who cannot use [`nh3`][nh3] (Python bindings to a Rust library).
62+
[^1]: Note that the [bleach] project has been [deprecated](https://github.com/mozilla/bleach/issues/698).
63+
However, it may be the only option for some users.
5564

5665
The following options are available on the `markdown.markdown` function:
5766

@@ -205,6 +214,20 @@ __tab_length__{: #tab_length }:
205214

206215
### `markdown.markdownFromFile (**kwargs)` {: #markdownFromFile data-toc-label='markdown.markdownFromFile' }
207216

217+
!!! warning
218+
219+
The Python-Markdown library does ***not*** sanitize its HTML output. If
220+
you are processing Markdown input from an untrusted source, it is your
221+
responsibility to ensure that it is properly sanitized. See [Markdown and
222+
XSS] for an overview of some of the dangers and [Improper markup
223+
sanitization in popular software] for notes on best practices to ensure
224+
HTML is properly sanitized.
225+
226+
As `markdown.markdownFromFile` writes directly to the file system, there
227+
is no easy way to sanitize the output from Python code. Therefore, it is
228+
recommended that the `markdown.markdownFromFile` function not be used on
229+
input from an untrusted source.
230+
208231
With a few exceptions, `markdown.markdownFromFile` accepts the same options as
209232
`markdown.markdown`. It does **not** accept a `text` (or Unicode) string.
210233
Instead, it accepts the following required options:
@@ -242,22 +265,6 @@ __encoding__{: #encoding }
242265
meet your specific needs, it is suggested that you write your own code
243266
to handle your encoding/decoding needs.
244267

245-
!!! warning
246-
247-
The Python-Markdown library does ***not*** sanitize its HTML output. If
248-
you are processing Markdown input from an untrusted source, it is your
249-
responsibility to ensure that it is properly sanitized. See [Markdown and
250-
XSS] for an overview of some of the dangers and [Improper markup
251-
sanitization in popular software] for notes on best practices to ensure
252-
HTML is properly sanitized.
253-
254-
The developers of Python-Markdown recommend using [`nh3`][nh3] or [`bleach`][bleach][^1]
255-
as a sanitizer on the output of `markdown.markdownFromFile`.
256-
However, be aware that those libraries may not be sufficient in
257-
themselves and will likely require customization. Some useful lists of
258-
allowed tags and attributes can be found in the
259-
[`bleach-allowlist`][bleach-allowlist] library, which should work with either sanitizer.
260-
261268
### `markdown.Markdown([**kwargs])` {: #Markdown data-toc-label='markdown.Markdown' }
262269

263270
The same options are available when initializing the `markdown.Markdown` class
@@ -273,6 +280,29 @@ string must be passed to one of two instance methods.
273280

274281
#### `Markdown.convert(source)` {: #convert data-toc-label='Markdown.convert' }
275282

283+
!!! warning
284+
285+
The Python-Markdown library does ***not*** sanitize its HTML output. If
286+
you are processing Markdown input from an untrusted source, it is your
287+
responsibility to ensure that it is properly sanitized. See [Markdown and
288+
XSS] for an overview of some of the dangers and [Improper markup
289+
sanitization in popular software] for notes on best practices to ensure
290+
HTML is properly sanitized.
291+
292+
The developers of Python-Markdown recommend using [JustHTML] as a
293+
sanitizer on the output of `Markdown.convert`. JustHTML includes a
294+
built-in HTML sanitizer. When you pass the HTML output through JustHTML
295+
(`JustHTML(md.convert(text), fragment=True).to_html())`), it
296+
is sanitized by default according to a strict [allow list policy]. The
297+
policy can be [customized] if necessary.
298+
299+
If you cannot use JustHTML for some reason, some alternatives include
300+
[`nh3`][nh3] or [`bleach`][bleach][^1]. However, be aware that those
301+
libraries will not be sufficient in themselves and will require
302+
customization. Some useful lists of allowed tags and attributes can be
303+
found in the [`bleach-allowlist`][bleach-allowlist] library, which should
304+
work with either sanitizer.
305+
276306
The `source` text must meet the same requirements as the [`text`](#text)
277307
argument of the [`markdown.markdown`](#markdown) function.
278308

@@ -300,6 +330,8 @@ To make this easier, you can also chain calls to `reset` together:
300330
html3 = md.reset().convert(text3)
301331
```
302332

333+
#### `Markdown.convertFile(**kwargs)` {: #convertFile data-toc-label='Markdown.convertFile' }
334+
303335
!!! warning
304336

305337
The Python-Markdown library does ***not*** sanitize its HTML output. If
@@ -309,14 +341,10 @@ html3 = md.reset().convert(text3)
309341
sanitization in popular software] for notes on best practices to ensure
310342
HTML is properly sanitized.
311343

312-
The developers of Python-Markdown recommend using [`nh3`][nh3] or [`bleach`][bleach][^1]
313-
as a sanitizer on the output of `Markdown.convert`. However, be
314-
aware that those libraries may not be sufficient in themselves and will
315-
likely require customization. Some useful lists of allowed tags and
316-
attributes can be found in the [`bleach-allowlist`][bleach-allowlist] library, which should
317-
work with either sanitizer.
318-
319-
#### `Markdown.convertFile(**kwargs)` {: #convertFile data-toc-label='Markdown.convertFile' }
344+
As `Markdown.convertFile` writes directly to the file system, there
345+
is no easy way to sanitize the output from Python code. Therefore, it is
346+
recommended that the `Markdown.convertFile` method not be used on
347+
input from an untrusted source.
320348

321349
The arguments of this method are identical to the arguments of the same
322350
name on the `markdown.markdownFromFile` function ([`input`](#input),
@@ -325,19 +353,3 @@ name on the `markdown.markdownFromFile` function ([`input`](#input),
325353
process multiple files without creating a new instance of the class for
326354
each document. State may need to be `reset` between each call to
327355
`convertFile` as is the case with `convert`.
328-
329-
!!! warning
330-
331-
The Python-Markdown library does ***not*** sanitize its HTML output. If
332-
you are processing Markdown input from an untrusted source, it is your
333-
responsibility to ensure that it is properly sanitized. See [Markdown and
334-
XSS] for an overview of some of the dangers and [Improper markup
335-
sanitization in popular software] for notes on best practices to ensure
336-
HTML is properly sanitized.
337-
338-
The developers of Python-Markdown recommend using [`nh3`][nh3] or [`bleach`][bleach][^1]
339-
as a sanitizer on the output of `Markdown.convertFile`. However, be
340-
aware that those libraries may not be sufficient in themselves and will
341-
likely require customization. Some useful lists of allowed tags and
342-
attributes can be found in the [`bleach-allowlist`][bleach-allowlist] library, which should
343-
work with either sanitizer.

0 commit comments

Comments
 (0)