According to Forbes, a team at Arizona State University has developed a machine learning system that actively monitors deepweb traffic for zero-day exploits before they actually happen.
In July, the news was covered in articles filled with information on the fully undetectable ransomware, Stampado, that was available to be purchased for only $39. The deepweb is filled with data dumps and the contents of website hacks and as the public grows more aware of the growing threat, so do security researchers.
The Arizona State University team’s approach is somewhat of a groundbreaking measure when it comes to cyberdefense. One of the more current tactics companies employ is to offer bounties for security bugs and exploits. This gives the company an opportunity to silently deal with the issue and provides the hacker an incentive to use the discovery for a less malicious purpose. The exploit bounty usually pays less than what the data such an exploit could provide but some exceptions do exist. Google, for instance, offers up to $20,000 for specific types of intrusions.
In a document published by the developers of the software capable of detecting zero-day exploits before day zero, they provide details on how the software is capable of learning and what kind of data it is able to track.
For instance, the software currently monitors 27 darknet markets and 21 forums for chatter about upcoming security threats, such as the Dyre Banking Trojan. A classifier is used to find security-relevant terminology and filter out both forum posts and market listings unrelated to cybersecurity.
As of the published paper, the team scanned 162,872 forum posts and only 19% were marked by their software. Similarly, 11,991 darknet market advertisements and listings were scanned and only 13% were found relevant.
In their initial tests they found the software, given it’s capability to learn from queries, was able to correctly identify 80% of the forum posts and 92% exploit-related products listed on darknet marketplaces. The machine learning system is also able to track vendors across multiple sites based on username and marketing similarities.
There’s three main components to the system. Crawlers that search for the exploit-related content make up the initial phase. Then specialized parsers extract pull the relevant content, communicating with the crawlers, to note changes and modifications on each page or post. The final step is a more specific filtering where a classifier interacts with the parsers to eliminate false positives. Given that that only 19% of 162,872 forum posts and 13% of 11,991 darknet market, the final step is key to making sure no drug or weapon related content is included – considering that drugs, weapons, and pornography are by far the more prevalent listings.
Creating the machine learning system was labor intensive for the team, they write. Each forum and market had to have a specially designed crawler, parser and machine learning classifier – all three of which were unique to each site. The learning dynamic of the classifiers is semi-structured, meaning that human minds are required to tag and label 25% of the data.
Forbes writes “the time and labor involved in building the system is worth it when you consider the payoff.” Each week, the system is producing an average of 305 accurate warnings while continually learning and getting better at collecting data. “If only 1% of these warnings results in discovering and patching a potential zero-day exploit before it can affect untold numbers of computers, the time to build and maintain the system will have been well spent.”