top of page

SOAR Your Own Way: Fundamentals to Security Alert Automation


soar

Welcome to the first iteration in my “SOAR Your Own Way” series, where I’ll cover everything SOAR related. The SOAR is truly a remarkable and versatile piece of technology that can be as simplistic or complicated as an organization wishes it to be. But first, what is a SOAR? SOAR stands for Security Orchestration Automation and Response, and it is an automation workflow tool that is used by security teams to handle automation around Detections and Incident Response use cases. But what does this mean exactly? 


Let’s take the following scenario: A user, named John, lives and works out of his home in California for a company called RealTech. As a fully remote company, travel is allowed by the company and employees are encouraged, though not required, to provide a notice. One morning, before business hours, a Suspicious Account Login security alert fires off for John’s account from India. Luckily, the security team at RealTech has been working hard at configuring their SOAR for the last year. Along with the alert comes information about John’s most common login locations, most recent account activity and location, registered machine information, and his role information showing he works in the finance department. Within a minute of the alert firing off, the security team can decipher that the new login IP doesn’t match any common login locations, is inconsistent with John’s activity from the previous evening, and the machine name is different from the registered machine name. From a quick glance, it seems John also has nothing travel related on his calendar - all immediate red flags. On top of that, they conclude he likely has access to sensitive information due to his role in the finance department. Due to his likely access to important documents, the security team member who saw the alert triggers a workflow that immediately locks down John’s account. The workflow tells John that his account has been compromised and he will have to meet with IT to reset his account password. The security team has successfully mitigated a threat in approximately 2-3 minutes with help from the SOAR - WOW.


People who have worked on a security team before know exactly what the manual effort of this alert would take - writing and running two to three manual queries to understand the user activity, figuring out what position the user has in the company, deciphering what information they may have access to, and investigating if the travel is legitimate. This could be a time consuming process taking up to 15 minutes depending on the experience of the incident response team. And in a situation like this, every second counts! So, what are the keys to building out an effective SOAR to keep your company safe? Let’s break it down.


Understanding Your Data

This is fundamental in not only running an effective security team, but also essential in building out a SOAR. You need to answer the following questions: What logs am I ingesting and Why am I ingesting them? This data should tell you everything about a company and its users. So, it’s important to fully understand the log types, the events that are occurring inside of those logs, why those events are of interest, and any gaps in knowledge. Building on the example from above - what kind of data would we find of interest and why? Well, as a security team we would likely find their successfully authenticated login locations and activity from our Identity Provider (IdP) of interest because it would give us information on what activity is normal activity for the user and what is an anomaly.


Mastering Your Detections

Next, it is essential to understand how your detections work from a fundamental level. This not only helps in creating high fidelity detections, but also in creating an efficient triage process in your SOAR. It’s important to understand how detection rules trigger so that you know exactly what has happened when an alert shows up in your queue. It’s also equally important to understand what surrounding information will be useful for context on the alert. So, for the detection in the example , it’s essential to know that the alert triggers by a successful login from a blacklist of countries or from a country the user has never logged in successfully before. This alert also does not fire off very often, and with this knowledge, it should be immediately apparent the activity is anomalous and has high potential to be malicious.


Building Playbooks

Creating verbose IR playbooks is incredibly valuable to every security team out there. It’s impossible to remember the steps to triage hundreds of your detections in your head, so it’s important to document steps to investigate. This is not only for when SHTF (if you know, you know), but also for when your team expands and new members need to learn how the system works in your organization. The main question to answer is: What information do I absolutely need to know to triage this incident? The goal should be to be able to mark the ticket as closed, or escalate to a security incident just from the automated information collection. Going back to the scenario above, with your in-depth knowledge of your data and detections, you now know in order to triage the alert you would want to know the user’s common login locations, recent account activity, and the department and role the user works in.


Creating Queries

After learning about your data, understanding your detections, and creating a blueprint for the response actions, it’s time to build out the queries you'll need to find all the essential information to satisfy your IR playbooks. Since logs generally have a lot of added fluff to them that is not necessarily valuable,  it’s important to build these queries in a way that only shows the data that’s important right now. So, when building the sample query looking for a user’s most common logins, what would the most important information be? In our case it will be the IP address, the machine name, the city, and the country. The IP address is the best way to make an immediate comparison, and the city and country will give better context to where the user actually lives. This way, you can decipher if travel would be plausible for the user (i.e. a bordering country).


Architecting Basic Workflows

Now that all of the basics building blocks are in place and thought out, it’s (finally) time to create some automated workflows. There are many different SOAR products out there, and there are some intricacies between them, but the basics will remain the same. First, I’d recommend building a basic “Main” workflow. The purpose of this workflow is going to be accepting all of the raw alerts via a web hook and passing them to sub workflows. For scalability, I’ve found it’s best to organize the sub workflows by the log type. Then, inside of each sub workflow, build out branches for each detection. So, for our example above, we’ll create a branch in our sub workflow that handles the Suspicious Account Logins detection. Inside this branch, we’ll run all of the queries we crafted as part of the Incident Response Playbook, making sure to only include the necessary columns. To finish off the workflow, we’ll then send the alert information with the included added context over to the connected ticketing system where the security team receives the alerts. The new enhanced alert will be ready to triage.


Automated Incident Response

After workflows have been built out to handle enhancing and adding context around the alerts, it’s time to tackle automated responses. For these workflows, you’ll need to break down the actions needed to mitigate and isolate a threat once deemed as malicious activity. After documenting your process, you can hook up the APIs necessary to make automated incident response a reality. For the example above, the response action was to isolate and lock the account, and the security team designed the response action to be triggerable by a button inside of the ticket.


Advanced Incident Response

Once your SOAR has a generous amount of workflows built out for all of your high fidelity detections, a lot of teams want to figure out ways to continue to remediate alert fatigue in their team. One very helpful way to do this is by utilizing a SlackBot, or another form of Out of Band (OOB) verification. However, it’s incredibly important to implement this in the correct way - with Multi-Factor Authentication (MFA). For our example scenario, the automated workflow could have triggered a Slack message to the user asking to verify the travel. It could give them a 10 minute window to verify using MFA from another registered device, or move on after 10 minutes with no response to still trigger an alert. Enabling a feature like this with MFA ensures the response is legitimate. However, this is more of a “nice to have” feature and should only be tackled after the SOAR is completely built out, but it is a great feature for more advanced teams!


More SOAR…

As you can see, the SOAR is an extremely powerful tool for Detection and Response teams, but making full use of it really requires a deep understanding of your environment and processes. I’m just beginning to scratch the surface of the SOAR, and I’ll be continuing to break it down in more depth in this continuing series. If you’ve enjoyed, continue to follow the Cybersec Cafe for more information on SOAR techniques, and other cyber security topics!

Comments


bottom of page