Skip to content

Commit fb02be2

Browse files
[PATCH] urllib.parse: Restrict IPvFuture regex to RFC 3986-valid characters
IPvFuture hostnames in URLs were being matched using a too-permissive regex (`.+`), which allowed invalid characters not defined by RFC 3986. This patch updates the pattern to only accept characters explicitly allowed by the RFC for IPvFuture addresses. According to RFC 3986 §3.2.2, the format of IPvFuture is: "v" 1*HEXDIG "." 1*( unreserved / sub-delims / ":" ) Where: - unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" - sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "=" Before the fix: >>> import urllib.parse >>> urllib.parse.urlparse("http://[v45.test|test]/path") ParseResult(scheme='http', netloc='[v45.test|test]', path='/path', ...) Invalid characters such as `|` were incorrectly accepted. After the fix: >>> import urllib.parse >>> urllib.parse.urlparse("http://[v45.test|test]/path") Traceback (most recent call last): ... ValueError: IPvFuture address is invalid This improves standards compliance and prevents malformed URLs from being silently accepted.
1 parent 2bd4ff0 commit fb02be2

1 file changed

Lines changed: 1 addition & 1 deletion

File tree

Lib/urllib/parse.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -460,7 +460,7 @@ def _check_bracketed_netloc(netloc):
460460
# https://www.rfc-editor.org/rfc/rfc3986#page-49 and https://url.spec.whatwg.org/
461461
def _check_bracketed_host(hostname):
462462
if hostname.startswith('v'):
463-
if not re.match(r"\Av[a-fA-F0-9]+\..+\z", hostname):
463+
if not re.match(r"\Av[a-fA-F0-9]+\.[\w\.~!$&*+,;=:'()-]+\z", hostname):
464464
raise ValueError(f"IPvFuture address is invalid")
465465
else:
466466
ip = ipaddress.ip_address(hostname) # Throws Value Error if not IPv6 or IPv4

0 commit comments

Comments
 (0)