NETWORK THREAT HUNTING WITH NETFLOW
Threat hunting is a good old process in the field of Cybersecurity. It covers human-driven analytics and searching through datasets (networks, endpoints, security solutions, etc.), in order to detect malicious activities, which could’ve evaded detection by existing IDPS or other automated detections.
The hunter’s job is to generate hypotheses, act like a detective, with an analytical mindset. However, it is not a reactive, but proactive activity what based on factors like CTI, friendly intelligence as well as personal experience („Where have I seen this before?). The goal is to hunt down the Tactics Technics and Procedures (TTP).
Netflow is just one dataset what could be useful. Especially if we are working in sensitive infrastructures where we don’t have the mandate to work with Pcaps. Maybe we also don’t want to know, what the Pcaps containing, like scenarios with heavy user data involvement. In these cases, we could use netflow for hunting, what is a striped-down version, „metadata” of the packet capture. It is practical and effective for our mission because it’s having the following information:
- Source IP address
- Destination IP address
- IP protocol
- Source port
- Destination port
- IP Type of Service
One more reason for the usage is the small footprint of the flow datasets, means the query time is faster than Pcap (faster query time means faster automated analysis), and do not need as much storage as Pcaps (100GB can be converted to ~380 MB of netflow data).
So, our job is not to just investigate these hypothesis’, but to generate new ones, based on it.
Let’s start with a classic one, historical analysis. Our SIEM system got CTI feed, it’s great, but what if an adversary already lurking in our environment, just appear later in the intelligence database? If we are conscious and got a great policy to keep historical data for 60 days, we could compare the source and destination IP’s with an updated CTI database. It is not so accurate, because the attacker’s IP could disappear as well from the database, but a chance to generate hypothesis. You know, you don’t even have a chance to win a lottery if you never buy a lottery ticket. Historical analysis is much more efficient for executables (hash comparing), but this is not our topic now, so continue with netflow.
To demonstrate the power of network flow data, the best is to see it in each phase of a “kill chain” attack model to check out for intrusions.
Reconnaissance: Active reconnaissance is pretty easy to find in a network flow. We would see returning external IP addresses, poking with internet-published hosts a lot of times. Definitely see port scans, host sweep scans, etc. This detection capability applies to both inside and outside of our organizational perimeter.
Weaponization: This stage is not possible if we are not act like a secret service and have network flow from the adversary network.
Delivery: If it’s a remote attack we can see it on the network flow. There are some options for detection. For example, we might see that someone hitting a web server a bunch of time or large packets on an email protocol has been sent to everyone on the network.
Exploitation: Usually it can be detected via host-based analytics, so we cannot see it in the network flow. If we have a host, that has been successfully exploited and they’ve cleared the logs so we can’t do forensics, then at least We got a chance to use Netflow to get a picture of the lateral movements.
Installation: It could be interesting if it comes from a known malicious IP, or an IP what just scanned our network, or some large packets on a protocol that shouldn’t have large packets, like DNS or SMTP. CTI is also essential (IPs, DGA, hashes, etc.)
Command&Control: Again, in a case of a remote attack. We could see beaconing what is easy to detect in network flow, whether we are see a visual traffic graph or do a streaming analytics with network flow data as well. If we are looking at external connection inbound from C2 host, we should check the RDP connections that aren’t supposed to be there. External connection starts from the outside, and pointing to an internal host, which is normal for servers, but not for endpoints. We can also see covert channels like DNS and http/https tunnelling. These channels could lead to a high-profile data breach, but malwares could also use to communicate to update (or to customer services, sometimes malwares have better than Microsoft). We could detect peer-to-peer or pulling instructions, stealth HTTP Post, Suspect Domain Activity, TOR activity and so on.
Actions on objectives: There are some silver bullets. For example, data exfiltration is easy to spot on the netflow, it’s a simple spike on the traffic line. In the case of lateral movement there is a horizontal network flow between hosts and endpoints in the environment, not just the usual vertical host-endpoint communication. The webserver is scanning our domain controller. It could also be interesting if our servers got Skype traffic, etc.
Regarding the tools there is such a big selection. You could use YAF for DPI and P0f OS fingerprinting, SiLK, or iSiLK with GUI, support your learning process, or Argus, what can read native Netflow data. The ultimate one is Bro, where weird.log could be interesting. If you like integrated, almost ready-to-use tools, you can also take a look at SELKS, Security Onion or other distros.