Background
RFC 3986 (spec for URIs) defines a valid port string with the following grammar rule:
Here's the WHATWG URL spec definition:
"""
A URL-port string must be one of the following:
- the empty string
- one or more ASCII digits representing a decimal number no greater than $2^{16} − 1$.
"""1
The bug
This is the port string parsing code from Lib/urllib/parse.py:166-176:
def port(self):
port = self._hostinfo[1]
if port is not None:
try:
port = int(port, 10)
except ValueError:
message = f'Port could not be cast to integer value as {port!r}'
raise ValueError(message) from None
if not ( 0 <= port <= 65535):
raise ValueError("Port out of range 0-65535")
return port
This will erroneously validate strings "-0" and f"+{x}" for any value of x in the valid range. Given that + and - are not digits, this behavior is in violation of both specifications.
This bug is easily reproducible with the following snippet:
from urllib.parse import urlparse
url1 = urlparse("http://python.org:-0")
url2 = urlparse("http://python.org:+80")
print(url1.port) # prints 0, but error is expected
print(url2.port) # prints 80, but error is expected
Happy to submit a PR, but don't want to step on any toes over at #25774.
My environment
- CPython version tested on:
- Operating system and architecture:
Background
RFC 3986 (spec for URIs) defines a valid port string with the following grammar rule:
port = *DIGITHere's the WHATWG URL spec definition:
"""
A URL-port string must be one of the following:
"""1
The bug
This is the port string parsing code from
Lib/urllib/parse.py:166-176:This will erroneously validate strings
"-0"andf"+{x}"for any value ofxin the valid range. Given that+and-are not digits, this behavior is in violation of both specifications.This bug is easily reproducible with the following snippet:
Happy to submit a PR, but don't want to step on any toes over at #25774.
My environment
Footnotes
Given that this is
urlparseand noturiparse, it seems appropriate that we do not accept port numbers outsiderange(2**16), even though such numbers are allowed by RFC 3986. ↩