How can we prevent malware from communicating with a C&C server? You may think of using a CTI (Cyber Threat Intelligence) feed with a network blacklisting appliance. You may also think about blocking certain protocols or even using a Next-Generation Firewall to perform traffic inspection. But malware creators can be quite clever. They wrote malware that communicates through the HTTP(S) protocol, hiding among all the other legitimate web requests in your organization. After all, the best place to hide a tree is in a forest. Now, this still doesn’t get around a CTI blacklist, but there’s more to the story.
Instead of using a single domain that can easily be marked as malicious, malware operators began registering hundreds of domains daily as rendezvous points with their malware. But they didn’t simply hard code these domains into their malware, as these could be easily dumped from the malware and blacklisted. Instead, they began using domain generating algorithms to create a potentially unlimited number of domains in a way that the domains generated by the C&C server would be mostly identical to the domains generated by the malware. As such, the malware would simply communicate through a new domain each time and the previous domains would be discarded. This makes it infeasible to create a CTI feed, as each malware family or even instance would be generating up to tens of thousands of domains each day, that you need to collect, store and blacklist.
So, do we just give up and accept that there’s nothing we can do after a malware infection has occurred? Of course not! Let’s take a look at one of these domains to see if we can get any clues about how to detect them. Take a look at the below domain. Looks like gibberish, right?
ovyvwnkjserklcrjwwhcpucyurwjaelg.com |
A CryptoLocker domain. |
That’s because it is. It is simply a random sequence of characters. Now, we don’t need to go into too much detail, but there is a mathematical way we can quantify the randomness of a string. So, we could simply flag domains that seem to be completely random, right? Well, this is exactly what we tried to do in the early stages of developing our detection capabilities. But when testing with a real-world dataset, we couldn’t get accuracies above 65%. This wasn’t a problem with how we were quantifying randomness, but rather that malware creators had thought of this as well. We could have gotten away with it if it weren’t for these meddling malware developers.
Below you can see some domains also generated by a DGA. They seem completely normal, right? This is because malware developers began using dictionaries to generate their domains. They simply combine random words together, often resulting in completely inconspicuous domains.
journeyready.net wouldinstead.net sickhurry.net darkhope.net cloudthirteen.net dutybegan.net christianaashleigh.net |
Example Suppobox domains. |
Therefore, we need to come up with something more advanced. There are many approaches you could take, but the one we chose was the use of neural networks. Neural networks are made up of a number of interconnected artificial neurons, modeled on biological neural networks such as those found in animal brains. This sounds quite esoteric, but at its core, it’s really quite simple. Think of them as a system that can infer what aspects of a domain name are important when determining whether or not they were algorithmically generated. It can do all this without us having to dictate exactly what sort of details it should be looking for. These details can become highly intricate and nearly impossible for a human to program. All we have to do is set up a good neural network architecture, collect a good dataset, and mark each domain in the dataset as either algorithmically generated or legitimate. The neural network will do most of the heavy lifting when it is trained. Sounds easy, but there are quite a few quirks when deciding what architecture to use and the subsequent optimizations required can be quite complex. Depending on who you ask, creating a good dataset is the most difficult, time-consuming, and also the most important part.
When setting up our dataset, we collected and obtained many gigantic datasets containing tens of gigabytes of domain names, both algorithmically generated and legitimate. We then crafted a neural network that we trained using these datasets to achieve up to 98% accuracy. Below you can see the accuracy we were able to achieve for 92 malware families in our validation dataset
DGA Family | Accuracy | DGA Family | Accuracy | DGA Family | Accuracy | DGA Family | Accuracy |
bamital | 100.00% | pandabanker | 99.99% | feodo | 100.00% | suppobox | 99.74% |
banjori | 99.97% | pitou | 65.49% | fobber | 98.70% | sutra | 99.31% |
bedep | 99.40% | proslikefan | 93.81% | gameover | 99.92% | symmi | 87.69% |
beebone | 100.00% | pushdotid | 95.98% | gameover_p2p | 99.99% | szribi | 94.54% |
blackhole | 100.00% | pushdo | 90.12% | gozi | 95.86% | tempedrevetdd | 96.23% |
bobax | 98.00% | pykspa2s | 99.06% | goznym | 91.76% | tempedreve | 96.08% |
ccleaner | 100.00% | pykspa2 | 99.34% | gspy | 100.00% | tinba | 99.44% |
chinad | 99.79% | pykspa | 97.47% | hesperbot | 94.38% | tinynuke | 99.63% |
chir | 100.00% | qadars | 99.68% | infy | 99.84% | tofsee | 98.40% |
conficker | 97.10% | qakbot | 99.45% | locky | 94.11% | torpig | 89.89% |
corebot | 99.64% | qhost | 60.87% | madmax | 99.74% | tsifiri | 100.00% |
cryptolocker | 99.43% | qsnatch | 42.93% | makloader | 100.00% | ud2 | 100.00% |
darkshell | 87.76% | ramdo | 99.98% | matsnu | 74.42% | ud3 | 95.00% |
diamondfox | 76.96% | ramnit | 97.67% | mirai | 95.71% | ud4 | 91.00% |
dircrypt | 97.83% | ranbyus | 99.75% | modpack | 86.88% | urlzone | 98.67% |
dmsniff | 91.00% | randomloader | 100.00% | monerominer | 99.99% | vawtrak | 94.85% |
dnsbenchmark | 100.00% | redyms | 100.00% | murofetweekly | 99.99% | vidrotid | 98.33% |
dnschanger | 97.20% | rovnix | 99.83% | murofet | 99.79% | vidro | 97.40% |
dyre | 99.92% | shifu | 97.90% | mydoom | 93.65% | virut | 97.69% |
ebury | 99.95% | simda | 97.49% | necurs | 97.39% | volatilecedar | 94.18% |
ekforward | 99.73% | sisron | 100.00% | nymaim2 | 67.74% | wd | 100.00% |
emotet | 99.88% | sphinx | 99.73% | nymaim | 91.32% | xshellghost | 100.00% |
omexo | 100.00% | padcrypt | 99.33% | oderoor | 97.92% | xxhex | 100.00% |
For comparison, you can find the accuracies produced by a method that merely looks at the randomness of the domain name. As you can see it performs a lot worse, especially with DGA’s that use dictionaries. It also struggles with very short domains, where there is not enough information to make a good prediction and the accuracy begins to devolve to a random guess or even worse.
DGA Family | Accuracy | DGA Family | Accuracy | DGA Family | Accuracy | DGA Family | Accuracy |
bamital | 97.40% | pandabanker | 38.69% | feodo | 89.58% | suppobox | 13.37% |
banjori | 77.43% | pitou | 0.01% | fobber | 56.20% | sutra | 57.33% |
bedep | 78.26% | proslikefan | 5.96% | gameover | 99.99% | symmi | 43.93% |
beebone | 40.95% | pushdotid | 15.42% | gameover_p2p | 99.55% | szribi | 4.90% |
blackhole | 80.02% | pushdo | 10.13% | gozi | 58.97% | tempedrevetdd | 9.20% |
bobax | 47.67% | pykspa2s | 25.92% | goznym | 15.11% | tempedreve | 20.10% |
ccleaner | 19.23% | pykspa2 | 26.37% | gspy | 63.27% | tinba | 33.36% |
chinad | 96.40% | pykspa | 18.66% | hesperbot | 43.82% | tinynuke | 99.23% |
chir | 51.00% | qadars | 78.16% | infy | 10.35% | tofsee | 0.00% |
conficker | 9.83% | qakbot | 72.14% | locky | 39.63% | torpig | 6.94% |
corebot | 95.16% | qhost | 26.09% | madmax | 37.89% | tsifiri | 0.00% |
cryptolocker | 63.56% | qsnatch | 0.14% | makloader | 100.00% | ud2 | 93.93% |
darkshell | 0.00% | ramdo | 16.18% | matsnu | 46.87% | ud3 | 88.33% |
diamondfox | 2.92% | ramnit | 54.79% | mirai | 67.14% | ud4 | 4.00% |
dircrypt | 56.96% | ranbyus | 70.03% | modpack | 9.38% | urlzone | 64.79% |
dmsniff | 4.00% | randomloader | 20.00% | monerominer | 78.20% | vawtrak | 10.19% |
dnsbenchmark | 100.00% | redyms | 67.65% | murofetweekly | 100.00% | vidrotid | 32.67% |
dnschanger | 18.65% | rovnix | 97.87% | murofet | 68.08% | vidro | 37.65% |
dyre | 98.60% | shifu | 7.16% | mydoom | 0.64% | virut | 0.00% |
ebury | 85.20% | simda | 3.12% | necurs | 53.97% | volatilecedar | 65.86% |
ekforward | 0.00% | sisron | 11.82% | nymaim2 | 36.40% | wd | 99.79% |
emotet | 77.75% | sphinx | 80.71% | nymaim | 12.63% | xshellghost | 54.00% |
omexo | 100.00% | padcrypt | 3.77% | oderoor | 13.92% | xxhex | 0.00% |
We often associate machine learning with many graphics cards or even tensor processing units, and you may assume that our detection method would consume a load of resources to make predictions. However, this is not really the case. We tested the throughput of our implementation and summarized the results below. As you can see no special hardware is required to run these detection methods. Keep in mind that the throughput refers to unique domains. In a real-world scenario, with deduplication and a whitelist, you will struggle to saturate even a single vCPU.
vCPU | Minimum RAM | Throughput |
1 | 2 GB | ~ 28.000 domains / minute |
2 | 3 GB | ~ 62.000 domains / minute |
4 | 4 GB | ~ 108.000 domains / minute |
8 | 4 GB | ~ 188.000 domains / minute |
16 | 7 GB | ~ 300.000 domains / minute |
We use the previously discussed neural network and many other tools in our Fusion Center to help protect our clients’ infrastructures. If you would like to learn more about how neural networks work and how we can use them to detect DGAs, read our new whitepaper on the topic.
We go into details about how DGAs work, what neural network architectures we can employ, and how these architectures perform when detecting these domains. If you’re in the mood for something less technical, learn more about the tools, techniques, and philosophies that set our Fusion Center aside from a regular SOC.