linkcheck.url

Functions for parsing and matching URL strings.

Functions

collapse_segments(path)

Remove all redundant segments from the given URL path.

document_quote(document)

Quote given document.

idna_encode(host)

Encode hostname as internationalized domain name (IDN) according to RFC 3490.

is_numeric_port(portstr)

return: integer port (== True) iff portstr is a valid port number, False otherwise

parse_qsl(qs, encoding[, keep_blank_values, ...])

Parse a query given as a string argument.

split_netloc(netloc)

Separate userinfo from host in urllib.parse.SplitResult.netloc.

splitparams(path)

Split off parameter part from path.

splitport(host[, port])

Split optional port number from host.

url_fix_host(urlparts, encoding)

Unquote and fix hostname.

url_fix_mailto_urlsplit(urlparts)

Split query part of mailto url if found.

url_fix_wayback_query(path)

url_needs_quoting(url)

Check if url needs percent quoting.

url_norm(url, encoding)

Normalize the given URL which must be quoted.

url_parse_query(query, encoding)

Parse and re-join the given CGI query.

url_quote(url, encoding)

Quote given URL.

urlunsplit(urlparts)

Same as urllib.parse.urlunsplit but with extra UNC path handling for Windows OS.

linkcheck.url.collapse_segments(path)[source]

Remove all redundant segments from the given URL path. Precondition: path is an unquoted url path

linkcheck.url.document_quote(document)[source]

Quote given document.

linkcheck.url.idna_encode(host)[source]

Encode hostname as internationalized domain name (IDN) according to RFC 3490. :raise: UnicodeError if hostname is not properly IDN encoded.

linkcheck.url.is_numeric_port(portstr)[source]

return: integer port (== True) iff portstr is a valid port number, False otherwise

linkcheck.url.is_safe_domain(string, pos=0, endpos=9223372036854775807)

Matches zero or more characters at the beginning of the string.

linkcheck.url.is_safe_url(string, pos=0, endpos=9223372036854775807)

Matches zero or more characters at the beginning of the string.

linkcheck.url.parse_qsl(qs, encoding, keep_blank_values=0, strict_parsing=0)[source]

Parse a query given as a string argument.

Parameters:
  • qs (string) – URL-encoded query string to be parsed

  • keep_blank_values (bool) – flag indicating whether blank values in URL encoded queries should be treated as blank strings. A true value indicates that blanks should be retained as blank strings. The default false value indicates that blank values are to be ignored and treated as if they were not included.

  • strict_parsing (bool) – flag indicating what to do with parsing errors. If false (the default), errors are silently ignored. If true, errors raise a ValueError exception.

Returns:

list of triples (key, value, separator) where key and value are the split CGI parameter and separator the used separator for this CGI parameter which is either a semicolon or an ampersand

Return type:

list of triples

linkcheck.url.split_netloc(netloc)[source]

Separate userinfo from host in urllib.parse.SplitResult.netloc. Originated as urllib.parse._splituser().

linkcheck.url.splitparams(path)[source]

Split off parameter part from path. Returns tuple (path-without-param, param)

linkcheck.url.splitport(host, port=0)[source]

Split optional port number from host. If host has no port number, the given default port is returned.

Parameters:
  • host (string) – host name

  • port (int) – the port number (default 0)

Returns:

tuple of (host, port)

Return type:

tuple of (string, int)

linkcheck.url.url_fix_host(urlparts, encoding)[source]

Unquote and fix hostname. Returns is_idn.

linkcheck.url.url_fix_mailto_urlsplit(urlparts)[source]

Split query part of mailto url if found.

linkcheck.url.url_fix_wayback_query(path)[source]
linkcheck.url.url_is_absolute(string, pos=0, endpos=9223372036854775807)

Matches zero or more characters at the beginning of the string.

linkcheck.url.url_needs_quoting(url)[source]

Check if url needs percent quoting. Note that the method does only check basic character sets, and not any other syntax. The URL might still be syntactically incorrect even when it is properly quoted.

linkcheck.url.url_norm(url, encoding)[source]

Normalize the given URL which must be quoted. Supports unicode hostnames (IDNA encoding) according to RFC 3490.

Returns:

(normed url, idna flag)

Return type:

tuple of length two

linkcheck.url.url_parse_query(query, encoding)[source]

Parse and re-join the given CGI query.

linkcheck.url.url_quote(url, encoding)[source]

Quote given URL.

linkcheck.url.urlunsplit(urlparts)[source]

Same as urllib.parse.urlunsplit but with extra UNC path handling for Windows OS.