linkcheck.cache.robots_txt

Cache robots.txt contents.

Classes

RobotsTxt(useragent)

Thread-safe cache of downloaded robots.txt files.

class linkcheck.cache.robots_txt.RobotsTxt(useragent)[source]

Bases: object

Thread-safe cache of downloaded robots.txt files. format: {cache key (string) -> robots.txt content (RobotFileParser)}

Initialize per-URL robots.txt cache.

add_sitemap_urls(rp, url_data, roboturl)[source]

Add sitemap URLs to queue.

allows_url(url_data, timeout=None)[source]

Ask robots.txt allowance.

get_lock(**kwargs)[source]

Return lock for robots.txt url.