Free Internal Links Checker SEO Tool - Create Your Own

Want to check how many internal links are present for a specific page? Online seo tools are not giving the right results? Now it is time to create your own free seo tool using Python which will check the internal links for any page you provide.

Purpose of the Code:
The purpose of this free internal links checker seo tool is to search through a website, recursively crawling its internal links, to find which pages link to a specific target URL. This can be useful for:

SEO purposes (finding where a particular page is linked from).

Web analysis (mapping internal link structure).

This free seo tool using Python code performs web scraping and crawling on a specific website to find and list pages that contain a given target URL. Here's a detailed explanation of what the code does:

1. Imports and Setup:

requests: Used to send HTTP requests and fetch the content of web pages.

BeautifulSoup: From the bs4 library, it's used for parsing HTML and navigating the structure of the page to extract data.

urlparse and urljoin: From the urlparse module (which was moved to urllib.parse in Python 3), these functions are used to handle and resolve URLs, particularly relative URLs.

2. Global Variables:

base_url: The root URL of the website (https://newscurrentaffairs.info) that the crawler will start from.

target_url: A specific URL (https://newscurrentaffairs.info/free-tools/free-google-index-checker-seo-tool-create-your-own.html) that the script looks for in the pages it crawls.

visited: A set that stores URLs that have already been crawled to avoid revisiting them.

pages_with_target: A list that stores URLs of pages that contain a link to the target URL.

3. crawl(url) Function:
This is a recursive function that crawls the website starting from the given url.

Base condition: If the URL has already been visited or is not from the base domain (base_url), the function returns without doing anything.
Sending HTTP Request:

A GET request is sent to the URL using requests.get.

If the request is successful (response.raise_for_status() ensures that an error is raised for invalid responses), the URL is added to the visited set.

Parsing the HTML:
The HTML content of the page is parsed using BeautifulSoup.

Checking for the Target URL:

The code looks for an anchor (<a>) tag with an href attribute that matches the target_url.

If such a link is found, the current page URL is added to the pages_with_target list.

Crawling Internal Links:

The function then loops over all anchor (<a>) tags on the page, extracts their href attributes (links), and resolves any relative URLs using urljoin.

It checks if the full URL is internal by comparing the network location (domain) part of the URL (urlparse(full_url).netloc) with the base_url.

If the URL is internal, it recursively calls crawl to visit that page.

Error Handling:
The function handles exceptions like network errors using a try-except block. If an error occurs during crawling a URL (e.g., the page is unreachable), it prints an error message and continues.

4. Main Execution:

The script starts by calling the crawl function on the base_url, which begins the crawling process.

After the crawl finishes, the script prints the list of URLs (pages_with_target) that contain the target_url.

Example Scenario:
If the target URL is https://newscurrentaffairs.info/free-tools/free-google-index-checker-seo-tool-create-your-own.html, the code will:

Start crawling from the homepage (https://newscurrentaffairs.info).

Visit each page on the site, checking if any of those pages contain a link to the target URL.

Print a list of pages that contain the target URL.

Here is the Python code for free internal links checker seo tool:

import requests
from bs4 import BeautifulSoup
from urlparse import urljoin, urlparse

# Base URL of the website
base_url = "https://newscurrentaffairs.info"
# Target URL to check
target_url = "https://newscurrentaffairs.info/free-tools/free-google-index-checker-seo-tool-create-your-own.html"

# A set to track visited URLs
visited = set()
# A list to store pages containing the target URL
pages_with_target = []

def crawl(url):
    """Recursively crawl pages on the website."""
    if url in visited or not url.startswith(base_url):
        return

    try:
        response = requests.get(url, timeout=10)
        response.raise_for_status()
        visited.add(url)

        soup = BeautifulSoup(response.text, 'html.parser')

        # Check if the target URL is in the current page
        if soup.find("a", href=target_url):
            pages_with_target.append(url)

        # Find all internal links on the page
        for link in soup.find_all("a", href=True):
            href = link["href"]
            # Resolve relative URLs
            full_url = urljoin(base_url, href)

            # Ensure it's an internal link
            if urlparse(full_url).netloc == urlparse(base_url).netloc:
                crawl(full_url)

    except (requests.RequestException, Exception) as e:
        print("Error crawling {}: {}".format(url, e))

if __name__ == "__main__":
    # Start crawling from the homepage
    crawl(base_url)

    # Print results
    print("Pages containing the target URL:")
    for page in pages_with_target:
        print(page)

Summary

Install Python and required libraries.

Copy the code in a file and save the file with internallinkschecker.py extension.

In command prompt, run the command - python internallinkschecker.py