Python Status Code Checker for XML-Sitemap

Did you know that you can easily automate a lot of monitoring routines for important SEO KPIs yourself? Here is a small Python recipe that works with only one parameter.

The only parameter you’ll need to use in this Python Script is the sitemap URL. When you run the script you’ll get the http status code for all URLs.

On this post
    Python Script: Compare the content of 2 Website sitemaps on N-gram level

    The workflow looks like this:

    • If the main sitemap points to multiple sub sitemaps in XML format the script is collecting all of them.
    • All available XML sitemaps are parsed and the URLs are extracted.
    • All URLs are checked for their status code.
    • All results are stored in a CSV file: the first column contains the URL, the second one the status code.

    Schedule your monitoring routines

    We have Status Code Checker Tool that does the same job but this Python script has another advantage. You can run a lot of monitoring routines for important SEO KPIs in an automated way. You don’t have to open a desktop application and run the check. Schedule daily script runs and send an e-mail for URLs where the status code is not like “200”.

    We’re currently playing around with sending results of multiple SEO audit scripts to Grafana. And we like the idea of putting a lot of different audit tasks into a centralized dashboard.

    Now have fun with the script.

    Python Script for Status Code Checker

    # Pemavor.com Sitemap URL Status Code Checker
    # Author: Stefan Neefischer
    # 1) Enter your Sitemap URL
    # 2) Get CSV File with URL and Http Status Code
    
    import advertools as adv
    import pandas as pd
    import requests
    import time
    import warnings
    warnings.filterwarnings("ignore")
    
    
    def getStatuscode(url):
        try:
            r = requests.head(url,verify=False,timeout=25, allow_redirects=False) # it is faster to only request the header
            return (r.status_code)
        except:
            return -1
    
    
    def sitemap_status_code_checker(site,SLEEP):
        #get all urls from sitemap
        print("Start scraping sitemap urls")
        sitemap = adv.sitemap_to_df(site)
        sitemap = sitemap.dropna(subset=["loc"]).reset_index(drop=True)
        url_list=list(sitemap['loc'].unique())
        print("all sitemap urls have been scraped")
    
        print("Checking status code")
        # Loop over full list
        url_statuscodes = []
        for url in url_list:
            print(url)
            check = [url,getStatuscode(url)]
            time.sleep(SLEEP)
            url_statuscodes.append(check)
    
        # Save the result as csv file
        url_statuscodes_df=pd.DataFrame(url_statuscodes,columns=["url","status_code"])
        url_statuscodes_df.to_csv("sitemapUrls_withStatusCode.csv",index=False)
        print("sitemapUrls_withStatusCode.csv created and saved")
    
    
    # Enter your XML Sitemap
    sitemap = "https://www.pemavor.com/sitemap.xml"
    
    SLEEP = 1.0 # Time in seconds the script should wait between requests
    sitemap_status_code_checker(sitemap,SLEEP)

    HTTP Status Code Checker Tool

    With PEMAVOR’s HTTP Status Code Checker Tool, you can monitor your routines. It takes just a few seconds and is a always-free tool. It’s super easy to use. Just follow these steps:

    • Paste your URL (or a list of URLs) into the field.
    • Click “Check” to generate your report.
    • View your data.

    Do you need a custom solution?

    We’re always here to build innovative solutions for your business. Your vision. Our expertise. Together, we create success. Invest in excellence and contact us now.

    More Similar Posts