The recent global epidemic of terrorist attacks has revealed how terrorists are not only utilizing the surface web, but also the dark web for building and establishing their networks. To counteract this, Law Enforcement Agencies (LEAs) need efficient and effective means for detection and monitoring of terrorism related content on the internet, in order to prevent the occurrence of terrorist attacks via analyzing and mining relevant content that can lead to tracing back of the real world identities of members of terrorist organizations, the types of explosives they use, how they build their bombs and even their upcoming attacks.

Websites on the Dark Web utilize anonymization algorithms to mask their IPs and physical geolocations, which renders the Dark Web an ideal place for storing content and providing means of communications for terrorists. Furthermore, terrorists can now implement technologies that has the capability to detect and block bots that automatically gather and parse content from anonymous services on the Dark Web, which renders it even much more difficult for LEAs to detect and collect such content.

As such, there is now an urgent need for tools that can automate the process of discovering and gathering of content related to terrorism on the internet, whether on the Surface or the Dark Web, while also bypassing the previously mentioned protection mechanisms.

Automatically Detecting Terrorism Related Content On the Surface And Deep Web:

A group of researchers from the University of Arizona, created a project, which they named “The Dark Web Project”, which aimed at the formulation of special tools that can crawl the web for the detection and gathering of terrorism related content in general. The project mainly focused on the crawling and detection of content related to terrorism on forums, rather than on general websites, with the application of special authentication techniques in order to bypass the majority of protection mechanisms deployed when they are accessed. The HOMER project aimed at discovering and gathering of terrorism related content and especially information related to recipes for homemade explosives via the introduction of hybrid crawlers, which are capable of accessing websites on the surface web as well as on Tor, Freenet and I2P, which represent the most popular darknets on the Dark Web.

The HOMER project concluded that most servers storing such content deploy security mechanisms that block users who exhibit unusual behavior patterns (e.g. following links according to their order within the content of a website’s page or sending a large number of successive requests). Even the most advanced technological methodologies can fail at bypassing the applied protection mechanisms, as they mostly rely on static approaches for the detection and monitoring of relevant content ( e.g. using pre-trained classifiers, issuance of requests within constant predetermined time intervals, applying the breadth first crawling strategy….etc).

A recently published paper introduced a botnet, represented by a network that consists of several machines, referred to as bots or zombies, which automate a process emulating human like browsing behavior, for collecting content relevant to a specific field of interest from websites on the surface web and a number of darknets on the Dark Web. These bots are customized to gather content related to terrorism and also have the capability of utilizing specific codewords that have been identified by LEAs as being used by terrorists throughout their conversations. Each bot has a dynamic behavioral pattern that follows a unique crawling pattern that emulates human like internet browsing behavior, which renders it almost impossible for a server to distinguish the behavior of bots from that of real human users.

What is a Botnet?

The word botnet originates from two words; “robot” and “network”, and represents a communicating network of machines, which are referred to as bots or zombies. In most cases, bots are utilized in malicious tasks; as bots infect machines allowing them to be remotely controlled by the attacker without the knowledge of their owners. In the case of the crawling botnet we just discussed, the botnet software is supposedly willingly installed by the users, so no malware is installed or any other illegal actions take place for the distribution and operation of the crawling botnet. Generally speaking, botnet technologies, their frameworks and communication protocols can vary greatly according to the Botmaster’s (i.e. the developer and/or the controller of the botnet) goals he/she is looking to achieve via the botnet.