Building a Custom Crawler

When we talk of web application scanning, we often come across crawlers that are built into the automatic scanning tools we use for web application scanning. Tools such as Burp Suite, Acunetix, web inspect, and so on all have wonderful crawlers that crawl through web applications and try various attack vectors against the crawled URLs. In this chapter, we are going to understand how a crawler works and what happens under the hood. The objective of this chapter is to enable the user to understand how a crawler collects all the information and forms the attack surface for various attacks. The same knowledge can be later used to develop a custom tool that may automate web application scanning. In this chapter, we are going to create a custom web crawler that will crawl through a website and give us a list that contains the following:

Web pages
HTML forms
All input fields within each form

We will see how we can crawl a web application in two modes:

Without authentication
With authentication

We will have a small GUI developed in the Django (a web application framework for Python) that will enable the users to conduct crawling on the test applications. It must be noted that the main focus of the chapter is on the workings of the crawler, and so we will discuss the crawler code in detail. We will not be focusing on the workings of Django web applications. For this, there will be reference links provided at the end of the chapter. I will be sharing the whole code base in my GitHub repository for readers to download and execute in order to get a better understanding of the application.

Table of Contents for Building a Custom Crawler

Create new playlist

Sign In

Sign Up

Table of Contents for
Building a Custom Crawler