top of page

Fighting Alert Fatigue: Solving the SOC


alert-fatigue

Anyone who has worked in a Security Operations Center (SOC) or in a Detection and Incident Response Team has experienced Alert Fatigue: desensitization to security alerts due to a high volume hitting your ticketing system. It’s an ever evolving battle to fight, and it can get increasingly difficult the longer it gets pushed off. But why is Alert Fatigue such a dangerous “disease” to your SOC Team?


Picture this: You’re working in the SOC at a growing FinTech company as an Analyst. In your environment, you have detections configured to pick up Multi-Factor Authentication (MFA) getting enabled or disabled on employee phones in your company. A large class of new hires is onboarding today, and there are a lot of alerts firing off to start the work day of new colleagues enabling MFA on their phones. You notice some users disabling MFA due to IT helping with troubleshooting, and adding to the amount of alerts filling the queue. Now you have to spend the time going through the tedious process of verifying activity with IT, verifying new hires, and closing out the ever growing queue of tickets. You get to the 15th ticket of the morning and see the end is nowhere near, and you start feeling the Alert Fatigue begin to kick in. It’s at this moment that a real threat or malicious attempt may slip through the cracks. In most cases, an Analyst may just give an alert a quick glance and move on without proper investigation and it is just a false positive. But, in the most serious of cases, a malicious actor may have been waiting for this exact moment. Just like this fictional FinTech company from this example, a malicious actor established a foothold in your environment weeks ago after finding leaked credentials, and has been waiting for the right moment to establish their own device with MFA. They see the influx of new hires and take the chance to add their own phone. All of a sudden they’re in, and the alert has gotten by you due to Alert Fatigue. The malicious actor made it by the last line of defenses and now they’re an authenticated user in your company with access to all the employee applications with customer data. Seem like a bit of a stretch? Maybe… but if the potential monetary incentive is big enough, nothing is truly impossible.


So where do you even start with this fight against alert fatigue? Blue teaming is difficult - it’s much easier to get into a company than it is to protect one. An attacker just needs to search for one opening, the defenders need to continuously search for and block all potential openings. And when engineering a SOC, it's extremely difficult to find a balance between having the right amount of detections for appropriate coverage of the environment, but not having too much noise from low quality detections, resulting in high amounts of generally useless tickets coming in. SOC teams everywhere are falling victim to Alert Fatigue quicker and quicker and it’s imperative to attack it as soon as possible, before it becomes unmanageable. I’ve identified five key areas from my experiences that are important to fighting off Alert Fatigue, and how to keep it away for good.


High Quality Detections

The first step when building out your SOC is to focus on building high quality detections. As you’re building out detection, it's important to get an understanding of not only what kind of activity you are trying to catch, but also to get an understanding of the log source that you are building detections around. There are many ways to achieve this, but in my experience, I’ve found the most helpful to be starting with reading through documentation of the log source (yes, I know this sucks - but you have to do it sometimes). After gaining a fundamental understanding of the logs, start to query the logs in your SIEM and understand the schema in your environment. Once you have an understanding of the log activity, lastly I’d suggest accessing the software you are building detections around and try to trigger logs yourself and see how they look in the SIEM. After doing all of this, you’ll have a better understanding of the log source and the knowledge to begin writing high quality detections.


Practicing a Data Driven Detection Lifecycle

A detection lifecycle is an important process for any SOC because it facilitates a regular review and improvement to your detections. It’s difficult to write a perfect detection the first time around, and sometimes it requires some alerts to actually trigger in order to get a better feel for what is regular activity, and what should be alerted on or is anomalous. But, as your SOC grows, it will become harder to keep track of how detections are functioning. Querying for what detections trigger the most and using anecdotal evidence to tweak detections will only get you so far. That’s why it’s important to collect alert metrics on your tickets as you close them. It’s important to then collect these metrics and centralize these metrics in a way that’s digestible, like a dashboard. This will make it much easier to allow your metrics to drive your detection lifecycle moving forward, and make adjustments that are based on real metrics. I’ll explain more in depth how to do this in a future blog post.


Centralized Exceptions

As you begin to modify and grow your library of detections, exceptions will no doubt begin to come into play. One common and trivial mistake I’ve seen made is housing exceptions within the detections and maintaining them on a detection-by-detection basis. It’s much better to start and maintain a centralized source where all exceptions live, and import the needed exceptions into each detection. As your detection library gets bigger, there will generally be some overlap. Having the exceptions all in one place makes them much easier to manage and will also decrease possibilities for one-off discrepancies.


Automations Between Systems

Another issue I see many SOC teams struggle with, and also a key contributor to alert fatigue, is finding a single source of truth for tickets. Generally, alerts will trigger and populate a queue in your SIEM, feed to an alerting channel in a system like Slack or Microsoft Teams, and then also populate a ticketing software like Jira. Having so many systems can get exhausting closing out the same ticket in multiple places. Think about the example above: If 20 alerts came in, it would require 60 actions to close tickets in all of the systems, and time spent checking that all the systems line up. So it’s important to create automations between these systems so that Analysts can use and reference one source of truth with confidence the automations will ensure all of the different platforms line up.


SOAR Automations

A Security Orchestration Automation and Response (SOAR) tool is an integral piece to a SOC and, in my opinion, the biggest difference maker when it comes to alert fatigue. However, it is also generally one of the last pieces to the puzzle for many teams due to Detection maturity needed as a precursor. The main use for a SOAR is to create automations around your detections that can enhance alerts and decrease manual toil during investigations for analysts. This is achieved through automating running of queries and tasks to gather all the information that an analyst would need to either close a ticket and mark it as a false positive, or escalate to an investigation. As mentioned, this not only requires a deep understanding of your detections in your environment, but will also become an iterative process within the detection lifecycle. Just like creating a data driven lifecycle, building out a SOAR can be a complex process but majorly beneficial for any security team out there. Due to it’s complexity and possibilities, building out a SOAR deserves a couple separate blog posts of its own.


In conclusion, fighting off Alert Fatigue is a difficult and iterative process. It requires a deep understanding of your environment, a grasp of attacker motivations and actions, expertly crafted detections, scalable detection architecture, and complex automations. Even after implementing these processes, without an iterative Detection Lifecycle, Alert Fatigue can easily find its way back to haunting a security team faster then you would think. But with the right tools, a determined and knowledgeable team, and the right processes in place, Alert Fatigue stands no chance and the C-Suites can rest easy knowing their world class SOC is defending their company.

Comments


bottom of page