Learn how to create a website status checker in Python
If you often find yourself scraping data from websites, you should probably consider automating the process. Sometimes called “web scraping,” the process is common for sites that don’t provide formal APIs or feeds. Of course, you won’t get anywhere if the site you’re trying to retrieve isn’t available.
If you run your own site, you’ve probably experienced downtime before. This can be frustrating, cause you to lose visitors, and disrupt any activity your site may be responsible for. In such circumstances, it is beneficial to be able to easily check the availability of your website.
Python is a great scripting language, and its concise yet readable syntax makes implementing a site checker a simple task.
Creating Your Custom Website Checker
The website checker is tailor-made to accommodate multiple websites at once. This allows you to easily switch sites you no longer care about or start checking out sites you launch in the future. Checker is an ideal “skeleton application” on which you can develop further, but it demonstrates a basic approach to fetching web data.
Import Libraries in Python
To launch the project, you must import the requests library in Python with the import function.
import requests
The Requests library is useful for communicating with websites. You can use it to send HTTP requests and receive response data.
Store website URLs in a list
Once you have imported the library, you need to define and store the website URLs in a list. This step allows you to maintain multiple URLs, which you can verify with the Website Checker.
import requestswebsite_url = [
"https://www.google.co.in",
"https://www.yahoo.com",
"https://www.amazon.co.in",
"https://www.pipsnacks.com/404",
"http://the-internet.herokuapp.com/status_codes/301",
"http://the-internet.herokuapp.com/status_codes/500"
]
The variable website url stores the list of URLs. In the list, set each URL you want to check as an individual string. You can use the sample URLs in the code for testing purposes, or you can override them to start checking your own sites right away.
Next, store messages for common HTTP response codes. You can keep them in a dictionary and index each message by its corresponding status code. Your program can then use these messages instead of status codes for better readability.
statuses = {
200: "Website Available",
301: "Permanent Redirect",
302: "Temporary Redirect",
404: "Not Found",
500: "Internal Server Error",
503: "Service Unavailable"
}
Creating a loop to check website status
To check each URL in turn, you will need to browse through the list of websites. Inside the loop, check the status of each site by sending a query through the Query Library.
for url in website_url:
try:
web_response = requests.get(url)
print(url, statuses[web_response.status_code])except:
print(url, statuses[web_response.status_code])
Where:
- for the url…iterates over the list of URLs.
- URLs is the variable to which the for loop assigns each URL.
- try/except handles any exceptions that may occur.
- web_response is a variable that provides a property with the status code of the response
The full code snippet
If you’d rather go through the entire code at once, here’s a comprehensive list of codes for your reference.
import requestswebsite_url = [
"https://www.google.co.in",
"https://www.yahoo.com",
"https://www.amazon.co.in",
"https://www.pipsnacks.com/404",
"http://the-internet.herokuapp.com/status_codes/301",
"http://the-internet.herokuapp.com/status_codes/500"
]
statuses = {
200: "Website Available",
301: "Permanent Redirect",
302: "Temporary Redirect",
404: "Not Found",
500: "Internal Server Error",
503: "Service Unavailable"
}
for url in website_url:
try:
web_response = requests.get(url)
print(url, statuses[web_response.status_code])
except:
print(url, statuses[web_response.status_code])
And here is an example of running the code:
Python Coding Capabilities in Web Scraping
Python’s third-party libraries are ideal for tasks such as web scraping and data retrieval over HTTP.
You can send automated requests to websites to perform different types of tasks. These may include reading news headlines, downloading images and automatically sending emails.
Comments are closed.