Commit fb02be2
authored
[PATCH] urllib.parse: Restrict IPvFuture regex to RFC 3986-valid characters
IPvFuture hostnames in URLs were being matched using a too-permissive regex
(`.+`), which allowed invalid characters not defined by RFC 3986.
This patch updates the pattern to only accept characters explicitly allowed
by the RFC for IPvFuture addresses.
According to RFC 3986 §3.2.2, the format of IPvFuture is:
"v" 1*HEXDIG "." 1*( unreserved / sub-delims / ":" )
Where:
- unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
- sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
Before the fix:
>>> import urllib.parse
>>> urllib.parse.urlparse("http://[v45.test|test]/path")
ParseResult(scheme='http', netloc='[v45.test|test]', path='/path', ...)
Invalid characters such as `|` were incorrectly accepted.
After the fix:
>>> import urllib.parse
>>> urllib.parse.urlparse("http://[v45.test|test]/path")
Traceback (most recent call last):
...
ValueError: IPvFuture address is invalid
This improves standards compliance and prevents malformed URLs from being
silently accepted.1 parent 2bd4ff0 commit fb02be2
1 file changed
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
460 | 460 | | |
461 | 461 | | |
462 | 462 | | |
463 | | - | |
| 463 | + | |
464 | 464 | | |
465 | 465 | | |
466 | 466 | | |
| |||
0 commit comments