Splunk and AI, Part 2 – Threat Hunting on Domain Controllers Using Deep Learning

This is a continuation post on developing an AI using Splunk. The first post of this series can be found here.

This post assumes that the domain controllers are already logging to Splunk. In order to create an AI to perform automated threat hunting on domain controllers, we will follow five steps:

Understand the problem space
Gather the appropriate data for the AI
Create and train the AI using the data
Test the AI
Future Considerations

Understand the Problem Space

Any kind of threat hunting requires the appropriate logs. You can’t hunt what you can’t see. In my case, I want to see any anomalies that occur on the command line of my domain controller. My hypothesis is that if an attacker compromises a domain controller, they will most likely have some kind of command line access to it. Therefore, I need to see logs of what happens on my domain controller’s command line. I can perform this by enabling command line logging on my domain controller via GPO, the steps for which can be found on Microsoft’s site here.

For my fellow penetration testers and red teamers, I realize that a lot of attacks don’t occur directly on the command line (PowerShell, C#, direct syscalls, etc), but this is just a starting point, so bear with me.

Gather the Appropriate Data for the AI

Now that I have the domain controller creating logs for anything executed on the command line, I can now see what command line logs my domain controller is producing within Splunk:

I will focus on the following four fields for each command line log: the account domain, the account name, the actual command line log, and the creator process name. Here are the reasons why I’m focusing on each of these four fields:

Account_Domain – if a command is executed by a user from a domain that does not normally show up on the domain controller, then this is an anomaly and I want to know about it

Account_Name – if a command is executed by a user that does not normally log into the domain controller to run commands, then this is an anomaly and I want to know about it

Process_Command_Line – if a command is executed on a domain controller that normally does not execute on my domain controller, then this is an anomaly and I want to know about it

Creator_Process_Name – if a process is spawning a child process that normally does not occur on my domain controller, then this is an anomaly and I want to know about it

Create and Train the AI Using the Data

With the command line logs ready to go, I can now create the AI needed in order to perform automated threat hunting. The approach for creating the AI is simple: I need to create an AI within Tensorflow that learns what is “normal” command line activity for the domain controller so that it can tell me what falls outside this range of “normal” with some degree of statistical certainty. To create this, I’m going to utilize deep learning. Specifically, I’m going to create a neural network that implements what’s known as an autoencoder. Autoencoders are able to reconstruct the most frequently observed characteristics of learned data. In my instance of the autoencoder, I will train it with a week’s worth of command line logs from my domain controller. Once it’s trained, it will know the “normal” characteristics of how a command line log from my domain controller is supposed to look. I can then send new command line logs from Splunk to the autoencoder where it will assign a “score” to each log, with a higher score meaning a higher likelihood that the command line log is an anomaly. Keep in mind that this explanation of how the autoencoder works is very high level and does not cover the technical details of its implementation.

Autoencoder Representation, credit to https://www.assemblyai.com/blog/introduction-to-variational-autoencoders-using-keras/

The Splunk DSDL app requires that you implement the fit() and apply() methods which are responsible for training your model and executing your model respectively. However, I will say that training the model within the DSDL app using the Splunk Machine Learning Toolkit commands is near impossible due to the volume of data that’s needed to train the autoencoder properly. I had to train the autoencoder outside of Splunk using my Nvidia GPU and then import it back into Splunk. I had to open up a support ticket to understand exactly how to do this, but the process they recommended does work, nonetheless. Once the model was trained and imported back into DSDL, I was able to implement the apply() function which was responsible for executing the model and assigning a “score” to an event, which in this case would be how anomalous the command line log appears to be.

Test the AI

Finally, it’s time to test the autoencoder. We’ll perform the test by running Active Directory reconnaissance commands that are not frequently executed on the domain controller. I’ll execute some basic reconnaissance commands using the net.exe utility and dsquery as well as a basic PowerShell command on the command line for good measure. With these commands successfully executed, I can view them within Splunk:

Now I’ll run the events through the AI and have it assign the events a score. I’ll also run other events through the AI as a baseline. Notice how the reconnaissance commands and the Powershell command score much higher than the baseline events under the anomaly_score_0 column:

I can now automate this search, therefore automating the threat hunting process, using Splunk’s capabilities for scheduled searches and real time searches.

Future Considerations

New tactics, techniques, and procedures arise all the time within cybersecurity. Even with the knowledge contained within the MITRE ATT&CK framework, it’s almost impossible to defend against every kind of attack that can hit a domain controller. Sophisticated attacks can also completely bypass Endpoint Detection and Response (EDR) solutions, meaning that while installing them on domain controllers greatly increases their defenses, they do not provide a perfect defense, especially when the attacks may come from legitimate users. Having an AI to routinely check your logs can provide some peace of mind when it comes to defending your domain controllers. What’s nice is that this autoencoder approach can be used for all kinds of logs (firewall, cloud providers, etc) for anomaly detection.

Something to keep in mind is that as new software is installed on the domain controller and new users access it, the domain controller may start generating new logs that are not anomalies/threats but are not known by the AI. Therefore, this AI will need to be updated periodically as new events establish a new “normal” for the domain controller. This retraining of the AI should be done on some regular cadence, whether that is quarterly or annually. Knowing what is normal for your environment is critical for performing useful threat hunts.

If you’d like to see this kind of automated threat hunting implemented in your Splunk environment, feel free to contact SmithSec and schedule a consultation!