Next Gen Blue Team

2 November 2018

Building a successful blue team (also known as defensive cyber) starts with thinking differently about the problem set and the traditional solutions. This is because traditional solutions to the problem of information security defense fail, provably, and over again. In order to be successful in the security defensive arena teams must be willing to break with tradition, envision new ways to succeed, and to incorporate the lessons of other computer science disciplines, especially development and DevOps.

There are a number of well-known frameworks for security, from the CSC Top 20 to the ACSC Essential 8 to more comprehensive guidelines like NIST 800-53. All of these are well meaning documents, but the reality is that the overwhelming majority of security teams lack the capability to implement even the first or second most highly recommended security controls. These are the obvious ones like understanding the applications and systems on a network and keeping them patched and up to date.

The "conventional wisdom" of traditional security team guidelines, controls, and composition serve only to ensure that efforts to secure environments following these strategies fail (no need to even cite evidence here, just search the news for "computer security breach" and you'll find all you can read). Operational realities often collide and oppose these traditional strategies and neuter them before they're even implemented.

Given this landscape what should teams do to accomplish success? The environment of modern computer security requires new and lateral approaches. A successful security strategy must inheirit from other successful computer science disciplines and leverage the computational power that suffuses every organization to its fullest extent.

Four Step Cycle

The four steps for success of the Next Gen Blue Team are:

  1. Documentation
  2. Automation
  3. Elevation
  4. Metrics

One can think of these steps as a cycle. New processes should be documented, then automated, then the team should move on to new processes or challenges (elevate) and ensure that metrics can be collected to demonstrate success.

Following this cycle will allow teams to improve, to develop data to demonstrate improvement, as well as gaps, and ultimately to develop a completely transparent security process. Transparency engenders trust and frees security teams from FUD cycles and hype, and is the true hallmark of a mature, and successful, security team.

Documentation

Documented process translates directly into operational maturity. Having a defined process that is clearly communicated helps advertise services, onboard new team members, ensures quality and consistency and prevents errors and omissions. Documentation must be living, however, and updated regularly to reflect operational realities, lessons learned, and continuous improvement.

Far too often security is considered a dark art. This means that new practitioners are required to study in an apprenticeship to gain experience before they can be effective. It also means that more senior security staff can sometimes hoard their knowledge as a means to retaining their relevance. The reality is, however, that security isn't hard. Not only is it fairly straightforward, it is also always changing. Locking secrets up in human vaults makes teams dependent on specific individuals, translates poorly into uniform training and knowledge sharing, and relies on anecdotes and FUD for decision making. If a question is asked more than once and has to be answered by an individual rather than found in documentation then that question, and the answer, needs to be recorded, documented, and distributed. This allows teams to level set, apply consistency and avoid single points of failures.

Writing down security information forces consideration, by the author and the reader. Refining documentation is painful but invaluable. By crafting careful documentation of processes, outcomes, resources, events, and even minutia like meeting minutes, the security team can examine improvement over time, onboard new members more rapidly, ensure that knowledge silos and single points of failure in the security team staff do not exist, and overall elevate the maturity of the security practice.

Documentation isn't easy. It takes time, when there are competing demands, it doesn't necessarily show immediate returns, it takes communication skills rather than technical acumen (which can be rare), it takes a deep understanding of the subject of documentation, and it can be painful to organize, revise, and keep track of. These pain points are far outweighed by the benefits, but unless you've seen the outcomes of good documentation this is often hard to believe and it can be a tricky sell to an already overburdened security team.

Documentation shouldn't merely seek to capture processes or rules, but also the living status of the security team. Holding Scrum like Sprint Retrospectives, either on a set time schedule, or after a major project (or incident) should produce documentation that leads to continuous improvement. Incident after action reports should include suggestions for improvements as well. These documents should be aggregated and reviewed periodically to ensure that common, or easy, changes are implemented. Having this documentation leads to defensible, data and intelligence driven, decision making and continuous improvement (see more below under Elevation).

Automation

Once process and practice is documented it can be automated. The first step to automation is defining the algorithm that will be used to manipulate data. Documentation serves as the cornerstone of this problem, and as soon as the documentation is complete, blue teams should turn their attention to automation.

Automation means using your computer systems to do the boring, repetitive work. This is what computers were expressly designed to do. Automation ensures consistency and saves your human resources for higher order problems. Any time security practitioners find themselves doing the same thing more than once they should take a page from the Larry Wall (creator of the Perl programming language) playbook when he said "The three chief virtues of a programmer are: Laziness, Impatience and Hubris." Any repeatable task should be automated. If the task can be decomposed into an algorithm then a computer can do the work more effectively than a human can. Be lazy, make the machine do the work!

Automation makes your security practice repeatable and consistent. It should remove the most burdensome work of a practitioner, whether it be reviewing a log, generating a report, or processing a workflow by sending a follow up email. Computers can take care of all of these tasks.

Most security systems involve Application Programming Interfaces (API's). You should be using them! If an API exists to create a ticket, query a log source, update an event, or alert an operator why not leverage that capability? Far too many security teams are dependent on the tools that vendors provides and this means they can never fully realize the investment in their security products. Hire a programmer and have them automate and integrate!

Not only will automation improve quality and consistency, it will also shorten reaction times to security events. Automation can allow for security at scale. While a human is only capable of a specific attention and output, computation works extremely well with massive volumes of data and instructions. By successfully leveraging automation small teams can service large organizations. While the traditional approach to security is to beg for more staff, smart organizations use automation to scale the efforts of their existing team thus alleviating the desire for greater head count.

Elevation

Elevation means continuous improvement, also referred to as "iteration" in the development community. Elevation means that as soon as you examine a task or workflow, document the process, then automate it you can move on to higher order concerns. Elevation means constantly examining and re-examining how things are done and trying to find better ways to do them. Elevation means leaving the boring, routine, and mundane to focus on complex problems of scale, consistency, polymorphic threats, and tomorrow's challenges.

No security team is created in a fully formed and perfect state. Security requires learning lessons, adapting, and iterating. A critical part of a security team's success is the ability to improve based on lessons learned and emerging threats. Iteration should focus not only on process improvement, but also on staff improvement. Investing in security staff is the best way to ensure quality output, engagement, and a viable talent pipeline.

Two great strategies for continuous improvement are Root Cause Analysis (RCA's) and retrospectives. An RCA is a defined, and sometimes formal, step in incident response. It involves gathering stakeholders and having an open, honest discussion about how the security response performed including the things that worked well, the things that didn't work well, and corrective measure that could be implemented to prevent future security events of a similar type. All of these outputs of the RCA should be captured, tracked, and hopefully implemented. Sometimes a corrective recommendation may be too expensive, infeasible, or may not seem justified, but by tracking RCA output it becomes possible to spot trends and to justify difficult changes if they're supported over time.

A second great mechanism to support continuous improvement is a simple retrospective. This is a team based exercise where participants gather at a predefined point in a cycle, which can be a day of a week or an end of a project, to discuss impressions of the cycle in much the same way as an RCA. This involves discussing what worked, what didn't and proposing ideas for future improvements and changes. These outputs should also be captured and ideally suggestions can be implemented as quickly as possible.

By tracking changes and improvements over time it becomes possible to review improvement at a quarterly or bi-annual review and demonstrate an evolution in the security team practice. A disciplined approach to elevation will lead to evidence based presentations of maturity over time. It also prevents a team from becoming stale by institutionalizing adaptation and change into the structure of the team itself.

Elevation also requires "shifting left," which is a common term in software development. Shifting left means applying testing as early in the lifecycle as possible. In general shift left means applying appropriate solutions to problems in as close a proximity to the problem lifecycle as possible. If you draw out a problem lifecycle from left to right, the solution should be introduced as close to the origin (left) as feasible.

Shifting left in security means enabling security solutions as close to their origin as you can. For instance, if a user forgets their password and has to call the help desk who have to elevate the request to the identity and access management team, who have to request authorization from the security team, there is a long left to right lifecycle to the process. Shifting left means enabling the help desk to solve the users problem immediately, by resetting their password without having to go through extra steps. Shifting left means empowering users with self-service solutions whenever possible. Shifting left means unburdening the security team of mundane issues that can be handled by other groups (including end users themselves) so that the security team can elevate their focus to harder problems!

Metrics

If you can't measure it, you can't improve upon it. Gathering, normalizing, identifying, and presenting data around a security team is vitally important to show success. Even if the security team isn't stopping any breaches, metrics that demonstrate improvement can still be used to justify investment. Metrics allow a team to identify their weak points, to measure effectiveness of investments, and to outline gaps that need to be addressed.

Metrics should be used to bolster the success stories of a team as well as to examine process to find room for improvement. Metrics should be collected at first to establish a baseline, but then to measure the success, or failure, of changes and improvements. For instance, if a new email security appliance is deployed, metrics should be available to show how that deployment affected the security team's time and effort dealing with malicious email (one would hope in a declining trend). Metrics allow a security team to even engage in testing and experimentation such as A/B Testing, to determine how well various changes and strategies work in the environment. Metrics should also be used to justify investment, in security software, appliances, and human capitol. Metrics can also be used to highlight gaps, or areas where a team's approach or capabilities are falling short. For example, by gathering metrics that show a disproportionate amount of time spent by the security team on commodity malware clean-up that information can be used to justify a new approach to anti-malware.

Metrics should be shared beyond the team with care. The danger of raw metrics is that they can be misinterpreted and can tell a story that differs dramatically from the one you might wish to tell. For instance, metrics showing that there are an extraordinary number of open security investigations might be presented with the intent of justifying the valiant and dogged efforts of the security team and to ward off further resource allocation, but in the wrong light this data might be interpreted as though the team is ineffective in resolving issues, or worse! It is important to distinguish between internal and external metrics and to ensure that metrics shared beyond the team are contextualized appropriately, explained carefully, and targeted to their audience.

Whenever possible metrics should be standardized. For instance, security teams should track security events and categorize these events using a standard taxonomy. The VERIS framework provides a perfect structure for this taxonomy. Creating a purpose built security event classification schema means that metrics can't translate beyond the team, could be inconsistent, and may change over time. Retrofitting metrics when essential quantifiers change is inordinately difficult and can lead to dirty data sets.

Metrics should be structured as well. You should strive to think of your metrics as data fitting into a database (and ideally you are databasing your metrics). If metrics vary you can end up with data that can't be effectively queried because certain attributes might not exist for some data in the set. For instance, if you track incidents over time be sure to track the start and stop time of incidents. This way you can track how long incidents take to resolve, or the average lifespan of an incident. If you don't have the start and end time for every incident, however, you end up with some incidents that don't fit into that query boundary and your result quality will suffer.

Visualization is the holy grail for metrics. Nothing beats a great chart that you can show to leadership. Visualization also allows for easy human evaluation of data sets. It allows observers to ask inductive questions such as "why did we have a spike in this data at this time?" or "what lead to the decrease in this type of incident last month?" or even "when we invested in this new defense did it decrease the number of related security events?"

Be sure to gather metrics on everything you can, from how many processes were documented, to how many documents were improved, to which staff resolve security events the fastest. Using metrics can guide security decision making in an intelligent, defensible way.

Conclusion

By following this four step process teams are can liberate themselves from the industry devotion to "traditional approaches" and begin to tackle, and overcome, the security challenges that matter to them. Each organization is different and using proscriptive frameworks that ignore differentiators and individuality is a recipe for failure. A team that follows the document-automate-iterate-metrics cycle can not only ensure scale and effectiveness, they can also become transparent to leadership, stakeholders, and themselves.

Developing transparency by following this methodology leads to maturity and trust. When organizational leadership asks how a specific security process is carried out the team should be able to point to documentation. When leadership asks how the team has improved there should artifacts from the elevation process to point to. When leadership asks what challenges the team is facing, or has overcome, the automation and metrics should be made available. When new team members join they should be able to reference all of these things to get a quick lay of the land and come up to speed on the team as rapidly as possible. When team members move on to other opportunities there should be no gap in continuity. By following the document-automate-elevate-metrics process all these outcomes become possible.