How can we prevent malware from communicating with a C&C server? You may think of using a CTI (Cyber Threat Intelligence) feed with a network blacklisting appliance. You may also think about blocking certain protocols or even using a Next-Generation Firewall to perform traffic inspection. But malware creators can be quite clever. They wrote malware that communicates through the HTTP(S) protocol, hiding among all the other legitimate web requests in your organization. After all, the best place to hide a tree is in a forest. Now, this still doesn’t get around a CTI blacklist, but there’s more to the story.

Instead of using a single domain that can easily be marked as malicious, malware operators began registering hundreds of domains daily as rendezvous points with their malware. But they didn’t simply hard code these domains into their malware, as these could be easily dumped from the malware and blacklisted. Instead, they began using domain generating algorithms to create a potentially unlimited number of domains in a way that the domains generated by the C&C server would be mostly identical to the domains generated by the malware. As such, the malware would simply communicate through a new domain each time and the previous domains would be discarded. This makes it infeasible to create a CTI feed, as each malware family or even instance would be generating up to tens of thousands of domains each day, that you need to collect, store and blacklist.

So, do we just give up and accept that there’s nothing we can do after a malware infection has occurred? Of course not! Let’s take a look at one of these domains to see if we can get any clues about how to detect them. Take a look at the below domain. Looks like gibberish, right?

ovyvwnkjserklcrjwwhcpucyurwjaelg.com
A CryptoLocker domain.

 

That’s because it is. It is simply a random sequence of characters. Now, we don’t need to go into too much detail, but there is a mathematical way we can quantify the randomness of a string. So, we could simply flag domains that seem to be completely random, right? Well, this is exactly what we tried to do in the early stages of developing our detection capabilities. But when testing with a real-world dataset, we couldn’t get accuracies above 65%. This wasn’t a problem with how we were quantifying randomness, but rather that malware creators had thought of this as well. We could have gotten away with it if it weren’t for these meddling malware developers.

Below you can see some domains also generated by a DGA. They seem completely normal, right? This is because malware developers began using dictionaries to generate their domains. They simply combine random words together, often resulting in completely inconspicuous domains.

journeyready.net

wouldinstead.net

sickhurry.net

darkhope.net

cloudthirteen.net

dutybegan.net

christianaashleigh.net

Example Suppobox domains.

 

Therefore, we need to come up with something more advanced. There are many approaches you could take, but the one we chose was the use of neural networks. Neural networks are made up of a number of interconnected artificial neurons, modeled on biological neural networks such as those found in animal brains. This sounds quite esoteric, but at its core, it’s really quite simple. Think of them as a system that can infer what aspects of a domain name are important when determining whether or not they were algorithmically generated. It can do all this without us having to dictate exactly what sort of details it should be looking for. These details can become highly intricate and nearly impossible for a human to program. All we have to do is set up a good neural network architecture, collect a good dataset, and mark each domain in the dataset as either algorithmically generated or legitimate. The neural network will do most of the heavy lifting when it is trained. Sounds easy, but there are quite a few quirks when deciding what architecture to use and the subsequent optimizations required can be quite complex. Depending on who you ask, creating a good dataset is the most difficult, time-consuming, and also the most important part.

When setting up our dataset, we collected and obtained many gigantic datasets containing tens of gigabytes of domain names, both algorithmically generated and legitimate. We then crafted a neural network that we trained using these datasets to achieve up to 98% accuracy. Below you can see the accuracy we were able to achieve for 92 malware families in our validation dataset

 

DGA Family Accuracy DGA Family Accuracy DGA Family Accuracy DGA Family Accuracy
bamital 100.00% pandabanker 99.99% feodo 100.00% suppobox 99.74%
banjori 99.97% pitou 65.49% fobber 98.70% sutra 99.31%
bedep 99.40% proslikefan 93.81% gameover 99.92% symmi 87.69%
beebone 100.00% pushdotid 95.98% gameover_p2p 99.99% szribi 94.54%
blackhole 100.00% pushdo 90.12% gozi 95.86% tempedrevetdd 96.23%
bobax 98.00% pykspa2s 99.06% goznym 91.76% tempedreve 96.08%
ccleaner 100.00% pykspa2 99.34% gspy 100.00% tinba 99.44%
chinad 99.79% pykspa 97.47% hesperbot 94.38% tinynuke 99.63%
chir 100.00% qadars 99.68% infy 99.84% tofsee 98.40%
conficker 97.10% qakbot 99.45% locky 94.11% torpig 89.89%
corebot 99.64% qhost 60.87% madmax 99.74% tsifiri 100.00%
cryptolocker 99.43% qsnatch 42.93% makloader 100.00% ud2 100.00%
darkshell 87.76% ramdo 99.98% matsnu 74.42% ud3 95.00%
diamondfox 76.96% ramnit 97.67% mirai 95.71% ud4 91.00%
dircrypt 97.83% ranbyus 99.75% modpack 86.88% urlzone 98.67%
dmsniff 91.00% randomloader 100.00% monerominer 99.99% vawtrak 94.85%
dnsbenchmark 100.00% redyms 100.00% murofetweekly 99.99% vidrotid 98.33%
dnschanger 97.20% rovnix 99.83% murofet 99.79% vidro 97.40%
dyre 99.92% shifu 97.90% mydoom 93.65% virut 97.69%
ebury 99.95% simda 97.49% necurs 97.39% volatilecedar 94.18%
ekforward 99.73% sisron 100.00% nymaim2 67.74% wd 100.00%
emotet 99.88% sphinx 99.73% nymaim 91.32% xshellghost 100.00%
omexo 100.00% padcrypt 99.33% oderoor 97.92% xxhex 100.00%

 

For comparison, you can find the accuracies produced by a method that merely looks at the randomness of the domain name. As you can see it performs a lot worse, especially with DGA’s that use dictionaries. It also struggles with very short domains, where there is not enough information to make a good prediction and the accuracy begins to devolve to a random guess or even worse.

 

DGA Family Accuracy DGA Family Accuracy DGA Family Accuracy DGA Family Accuracy
bamital 97.40% pandabanker 38.69% feodo 89.58% suppobox 13.37%
banjori 77.43% pitou 0.01% fobber 56.20% sutra 57.33%
bedep 78.26% proslikefan 5.96% gameover 99.99% symmi 43.93%
beebone 40.95% pushdotid 15.42% gameover_p2p 99.55% szribi 4.90%
blackhole 80.02% pushdo 10.13% gozi 58.97% tempedrevetdd 9.20%
bobax 47.67% pykspa2s 25.92% goznym 15.11% tempedreve 20.10%
ccleaner 19.23% pykspa2 26.37% gspy 63.27% tinba 33.36%
chinad 96.40% pykspa 18.66% hesperbot 43.82% tinynuke 99.23%
chir 51.00% qadars 78.16% infy 10.35% tofsee 0.00%
conficker 9.83% qakbot 72.14% locky 39.63% torpig 6.94%
corebot 95.16% qhost 26.09% madmax 37.89% tsifiri 0.00%
cryptolocker 63.56% qsnatch 0.14% makloader 100.00% ud2 93.93%
darkshell 0.00% ramdo 16.18% matsnu 46.87% ud3 88.33%
diamondfox 2.92% ramnit 54.79% mirai 67.14% ud4 4.00%
dircrypt 56.96% ranbyus 70.03% modpack 9.38% urlzone 64.79%
dmsniff 4.00% randomloader 20.00% monerominer 78.20% vawtrak 10.19%
dnsbenchmark 100.00% redyms 67.65% murofetweekly 100.00% vidrotid 32.67%
dnschanger 18.65% rovnix 97.87% murofet 68.08% vidro 37.65%
dyre 98.60% shifu 7.16% mydoom 0.64% virut 0.00%
ebury 85.20% simda 3.12% necurs 53.97% volatilecedar 65.86%
ekforward 0.00% sisron 11.82% nymaim2 36.40% wd 99.79%
emotet 77.75% sphinx 80.71% nymaim 12.63% xshellghost 54.00%
omexo 100.00% padcrypt 3.77% oderoor 13.92% xxhex 0.00%

 

We often associate machine learning with many graphics cards or even tensor processing units, and you may assume that our detection method would consume a load of resources to make predictions. However, this is not really the case. We tested the throughput of our implementation and summarized the results below. As you can see no special hardware is required to run these detection methods. Keep in mind that the throughput refers to unique domains. In a real-world scenario, with deduplication and a whitelist, you will struggle to saturate even a single vCPU.

 

vCPU Minimum RAM Throughput
1 2 GB ~ 28.000 domains / minute
2 3 GB ~ 62.000 domains / minute
4 4 GB ~ 108.000 domains / minute
8 4 GB ~ 188.000 domains / minute
16 7 GB ~ 300.000 domains / minute

 

We use the previously discussed neural network and many other tools in our Fusion Center to help protect our clients’ infrastructures. If you would like to learn more about how neural networks work and how we can use them to detect DGAs, read our new whitepaper on the topic.

We go into details about how DGAs work, what neural network architectures we can employ, and how these architectures perform when detecting these domains. If you’re in the mood for something less technical, learn more about the tools, techniques, and philosophies that set our Fusion Center aside from a regular SOC.

 

Pin It on Pinterest