The robots.txt Exclusion Protocol is implemented as specified in
An entry has one or more user-agents and zero or more rulelines.
RobotFileParser([url, session, proxies, …])
This class provides a set of methods to read, parse and answer questions about a single robots.txt file.
A rule line is a single “Allow:” (allowance==1) or “Disallow:” (allowance==0) followed by a path.
This class provides a set of methods to read, parse and answer
questions about a single robots.txt file.
Initialize internal entry lists and store given url and
Using the parsed robots.txt decide if useragent can fetch url.
True if agent can fetch url, else False
Look for a configured crawl delay.
crawl delay in seconds or zero
integer >= 0
Set the time the robots.txt file was last fetched to the
Returns the time the robots.txt file was last fetched.
This is useful for long-running web spiders that need to
check for new robots.txt files periodically.
last modified in time.time() format
Parse the input lines from a robot.txt file.
We allow that a user-agent: line is not preceded by
one or more blank lines.
Read the robots.txt URL and feeds it to the parser.
Set the URL referring to a robots.txt file.