linkcheck.checker.httpurl
Handle http links.
Classes
| 
 | Url link with http scheme. | 
- class linkcheck.checker.httpurl.HttpUrl(base_url, recursion_level, aggregate, parent_url=None, base_ref=None, line=-1, column=-1, page=-1, name='', url_encoding=None, extern=None)[source]
- Bases: - InternPatternUrl- Url link with http scheme. - Initialize check data, and store given variables. - Parameters:
- base_url – unquoted and possibly unnormed url 
- recursion_level – on what check level lies the base url 
- aggregate – aggregate instance 
- parent_url – quoted and normed url of parent or None 
- base_ref – quoted and normed url of <base href=””> or None 
- line – line number of url in parent content 
- column – column number of url in parent content 
- page – page number of url in parent content 
- name – name of url or empty 
- url_encoding – encoding of URL or None 
- extern – None or (is_extern, is_strict) 
 
 - allows_robots(url)[source]
- Fetch and parse the robots.txt of given url. Checks if LinkChecker can get the requested resource content. - Parameters:
- url (string) – the url to be requested 
- Returns:
- True if access is granted, otherwise False 
- Return type:
- bool 
 
 - check_connection()[source]
- Check a URL with HTTP protocol. Here is an excerpt from RFC 1945 with common response codes: The first digit of the Status-Code defines the class of response. The last two digits do not have any categorization role. There are 5 values for the first digit: - 1xx: Informational - Not used, but reserved for future use 
- 2xx: Success - The action was successfully received, understood, and accepted 
- 3xx: Redirection - Further action must be taken in order to complete the request 
- 4xx: Client Error - The request contains bad syntax or cannot be fulfilled 
- 5xx: Server Error - The server failed to fulfill an apparently valid request 
 
 - construct_auth()[source]
- Construct HTTP Basic authentication credentials if there is user/password information available. Does not overwrite if credentials have already been constructed. 
 - content_allows_robots()[source]
- Return False if the content of this URL forbids robots to search for recursive links. 
 - get_request_kwargs()[source]
- Construct keyword parameters for Session.send() and Session.resolve_redirects(). 
 - get_robots_txt_url()[source]
- Get the according robots.txt URL for this URL. - Returns:
- robots.txt URL 
- Return type:
- string