Part 1 - Getting to know your target

discovering content and gathering information from a web application

The first article in Web Application Hacking 101 series.

Uniform Resource Locator (URL)

All the content that is available on the internet is only accessible through a unique URL (Uniform Resource Locator), at a given point in time.

For example, consider MDN Docs ( We can divide the complete URL in the following parts:

  • https://
    Protocol. HTTPS is a secured and widely adopted protocol for the web.

  • developer.
    Subdomain. www is also a subdomain.

  • mozilla. org
    Domain name. Consists of Domain Name + Top Level Domain (.com, .in, etc).

  • 443/80
    Port. Inferred from protocol in URL (HTTP - 80, HTTPS: 443). Specified with a colon after the domain name for eg

  • /en-us/
    Resource path on the web application.


The goal in the content discovery phase is to get URLs of hidden pages, sections, images, videos, or any other information that is not intended to be visible to everyday users.

Manual Discovery

  1. **robots.txt**: This file gives a hint to restrict search engines from crawling certain or all pages of the website.
    Do: Check and request if /robots.txt path exists on a domain. If it does, list down the paths listed in it. Check if it is forbidding crawling by **Disallow: / ** directive.

  2. sitemap.xml: Unlike robots.txt, /sitemap.xml lists path patterns that the owner wishes to be listed on search engines. if not updated regularly these files can contain links to old pages.

  3. Framework Fingerprinting
    Many frameworks ship with a default favicon file. If it is not updated we can guess the framework name/version, documentation, and therefore vulnerabilities. We can also guess a framework by checking errors, HTML, specific paths (/wp-admin/), responses, cookies, and headers.
    API request headers help us guess the backend framework.

Open-Source Intelligence

  1. Google Hacking / Dorking: It is the process of querying google with different filters to get more information than a regular user.
    Exploit-DB maintains a database that contains a list of very useful filters and queries. Some of the common ones are, filetype:pdf, intitle:admin and inurl:admin .

  2. Wappalyzer: A helpful chrome extension to list frameworks, libraries, analytics, and other third-party services used by a web application.
    Do: Install Wappalyzer - Technology Profiler chrome extension and get to know technologies used on the target.

  3. Wayback Machine: Archive of an older website version. Kind of snapshots of a domain through time.

  4. Github: We can search through an organization's public code repositories, forks from open-source software, and repositories of developers working there to know more about internal infrastructure and guess what vulnerability the target might have.

Automated URL Discovery

Automated discovery is made by brute forcing HTTP requests using tools against wordlists of common patterns that are known to be loopholes.

Wordlists These text files contain a list of the most common words used for different use cases. For eg. a password wordlist might contain "admin", "password", "admin123".... and a thousand more words. A very good source of the word list is available on github.

Tools There are many tools available for this job. Most of them follow the same pattern i.e we pass in the target hostname and a wordlist. It will brute force words from the list and notify if there is a match on the server. One of them is Gobuster.

Disclaimer: This tool can send thousands of requests on a web application and can crash it. Make sure you have a formal nod from the owner of the website to perform these steps. You can also setup a dummy target on your local machine and then follow these steps.

Let's start by cloning SecLists to the local machine. -depth 1 tells git to clone only the latest snapshot of the repository and we are not interested in getting all commits.

cd /usr/share/
mkdir wordlists
git clone --depth 1

Install Gobuster using these steps. I'll be using brew to install it.

 brew install gobuster

Finally to test, run the following. Here machine-ip and domain-name is your target.

gobuster dir -u https://<machine-ip|domain-name>/ -w /usr/share/wordlists/SecLists/Discovery/Web-Content/common.txt

The default output will look something similar to below.

gobuster dir -u -w ~/wordlists/shortlist.txt

Gobuster v3.2.0
by OJ Reeves (@TheColonial) & Christian Mehlmauer (@firefart)
[+] Mode         : dir
[+] Url/Domain   :
[+] Threads      : 10
[+] Wordlist     : /home/oj/wordlists/shortlist.txt
[+] Status codes : 200,204,301,302,307,401,403
[+] User Agent   : gobuster/3.2.0
[+] Timeout      : 10s
2019/06/21 11:49:43 Starting gobuster
/categories (Status: 301)
/contact (Status: 301)
/posts (Status: 301)
/index (Status: 200)
2019/06/21 11:49:44 Finished

Source: gobuster examples.

Here we can see a list of routes found on the target

  • /categories (Status: 301)

  • /contact (Status: 301)

  • /posts (Status: 301)

Now, these URLs might actually be made available for users to use web applications conveniently.

But we are more interested in any stale URL leaking resources, API paths, or admin URLs.

If these URLs are not secured or there is no firewall to protect the brute force or robot detection then it becomes extremely easy to DDOS that endpoint to increase server cost, leak information, and interrupt in service of the application.

As developers, we must be more vigilant and informed than attackers and we should be continuously learning how to break our products before anyone else can.


  1. Fingerprint Web Application Framework

  2. SecLists Wordlists

  3. Gobuster