The term ‘bot traffic’ refers to any ‘visitor’ who visits your website or app who is not human. Typically this traffic is generated by automated software or programs, which we refer to as ‘internet bots’, ‘web robots’, or simply ‘bots.’

Around half of the total web traffic comes from bot traffic. In 2016, 51.8% of web traffic came from bots, and although the number has decreased today in 2020, most sources estimate that bot traffic still accounts for 45% to 50% of total website traffic.

These bots are programmed by their owner/operator to visit websites and perform certain tasks. Typically they are programmed to perform relatively simple but repeatable tasks, for example, copying thousands of images on a web page or posting comments on a social media profile.

It is very important to understand that although these bots have gained notoriety due to their widespread usage in various cyberattacks, there are actually good bots owned by reputable organizations like Google or Facebook, that are performing beneficial tasks for the websites they visit.

Good Bots VS Bad Bots

Good Bots

Here are some examples of common good bots available:

Search Engine Bots

Googlebot, for example, indexes and analyzes a web page’s content which will allow Google to rank this web page on its SERP (Search Engine Results Page). Obviously, most websites would want to be ranked on Google, so we wouldn’t want to block this traffic coming from Googlebot, as well as other search engine bots like Bing Bot, Yahoo! Bot, and so on.

Monitoring Bots

These bots monitor the health of a website and whether it is currently down to help users know whether the site is currently accessible. You can, for example, install your own monitoring bot so you will be notified as soon as your website goes offline.

Copyright Bots

These internet bots crawl various websites scanning for potential copyright breaches, for example when a website displays a copyrighted image or when a YouTube video plays a copyrighted song. These bots can be very helpful for copyright owners to ensure nobody is illegally using this copyrighted content.

Bad Bots

Unlike the good bots we have discussed above, bad bots are deployed to perform destructive tasks with malicious intents, including but not limited to:

Content Scraping

Content scraping by itself is not illegal, but when a bot is scraping for unauthorized/hidden content and using the content for malicious purposes, it can be categorized as malicious. In other cases, the bot might steal images and content and post them on other websites without permission, creating duplicate content issues and other potential damages.

Vulnerability Scanning

These bots scan websites and web apps for potential vulnerabilities and will notify its owner when there are potential exploits. The bot owner, then, can launch more dangerous attacks while exploiting these vulnerabilities like DDoS attacks, data breaches, malware injection, and other attacks.

Brute Force Attacks

The bot is utilized to guess password possibilities repeatedly, or by using stolen credentials in various other websites (credential stuffing attacks) as a part of an account takeover attempt. Once the hacker successfully gains access to the account, they can then perform other more damaging attacks.

Spam Bots

Fairly self-explanatory, these bots leave generated spams on a blog’s comment section, social media profiles, etc., often leaving a link to a fraud/scam website. These bots can also fill out contact forms and spam the website owners.

Challenges in Detecting and Managing Bot Traffic

There are two main challenges in detecting and managing bot traffic:

We have to differentiate traffic coming from good bots and bad bots. We wouldn’t want to accidentally block good bots that can be beneficial for our site. So, we can’t simply, for example, block all traffic we suspect as bots.

Malicious bots are now getting better at impersonating human traffic via AI and machine learning technologies. So, we need advanced solutions that are able to identify these sophisticated bots.

How To Identify and Manage Bot Traffic

By considering the two challenges above, here are some effective strategies we can use to tackle them:

Investing In a Proper Bot Mitigation Solution

Bot mitigation software can use three different approaches in detecting and managing bot activities:

  • Signature/Fingerprinting-Based: in this approach, the bot management solution compares the signatures detected on a traffic source with a known ‘fingerprint’ like OS type, browser version/type, devices used, IP address, etc.
  • Challenge-Based: we use tests like CAPTCHA to challenge the ‘user’. If it’s a legitimate human user, the challenge should be fairly easy to solve, but an automated program/bot will find it difficult if not impossible to solve the challenge.
  • Behavioral-Based: in this approach, the bot management solution analyzes the behavior of the traffic in real-time, for example, analyzing the mouse movements/clicks made by the user, whether the user makes any pattern resembling bot activities, etc.

Due to the sophistication of today’s shopping bots, a bot management solution that is capable of behavioral-based detection is recommended. DataDome is an advanced bot protection solution that uses AI and machine learning technologies to detect and manage bot traffic in real-time. Running on autopilot, DataDome will only notify you when there’s any malicious bot activity but you don’t have to do anything to protect your system.

Monitor Your Traffic

We can use various web analytics tools (like the free Google Analytics) to monitor website traffic and look for symptoms that might signify bot traffic, such as:

  • Increase in pageviews: a sudden spike in pageviews, especially when it is unprecedented, is a very likely symptom of bot traffic.
  • Increase in bounce rate: bounce rate refers to the number of users that only come to a single page then leave immediately without moving to another page and/or clicking anything on the page. An abnormal spike in bounce rate can be a sign of bots performing their tasks on a single page and then leaving immediately.
  • Abnormally high or low dwell time: Dwell time, or session duration, is how long a user stays on a website, and normally it should remain relatively steady. A sudden and unexplained increase or decrease in dwell time can be a sign of bot traffic.
  • Conversion rate: lower conversion rate due to fake account creations, fake form-filling activities, etc.
  • Traffic from an unexpected location: a sudden increase in activities of users from an abnormal location, especially from locations who aren’t fluent with the native language of the site.

Block Older Browsers/User Agents

Although this won’t help with experienced attackers that often mask their user agents and rotate between hundreds if not thousands of IP addresses per minute, this approach can help in defending against less sophisticated attackers and bots.

As a general rule of thumb, you should block browser versions that are older than 3 years, and you can implement CAPTCHa for browsers and user agents that are 2 years old or older.

Conclusion

When detecting and managing bad bots, it’s very important to avoid false positives and not to accidentally block good bots and legitimate human traffic. This is why getting a proper bot mitigation solution that can perform a behavioral-based analysis is now a necessity for everyone, no longer a luxury for bigger organizations and enterprises.