Introduction to Incident Response
The purpose of this tutorial is to provide a basic introduction to incident response. This document is by no means comprehensive, it is intended as a starting point, and provides a framework for approaching a broad spectrum of security incidents. Throughout this document, the term 'security incident' will be utilized without providing much in the way of concrete definitions of what that term means specifically. This is intentional and designed to create the broadest tutorial possible.
Incident Response Basics
Take a Deep Breath
The first step in any incident response is an extremely simple, but often overlooked one. This step is to stop and take careful measure of the situation. All to often incident responders rush into a problem with suspects and solutions in mind and begin the work of cleaning the system or starting analysis. Although incident response is exciting, it is important to remain calm and measured in any incident. Rushing with only botch up a response, and regardless of the severity or an incident or the time sensitivity of a resolution, it is important to think carefully and act deliberately at ever stage of response.
Apply Occam's Razor
One rule of incident response that I have found helpful in approaching any security event is Occam's Razor. This principle states, in a nutshell, that the simplest solution is usually the best. Another interpretation is that one should make as few assumptions as possible. In general, when creating hypothesis or attempting to determine the cause of a certain effect, it is easier (and generally more accurate) to approach the obvious solutions first. It is far more likely that a simple misconfiguration is to blame for an incident than a Russian hacker breaking into a laptop via wireless and installing a keystroke logger.
Has an Incident Really Occurred?
Once you have taken a deep breath and are prepared to begin a response operation in a controlled, logical manner the first most obvious question that should be asked is: "Has an incident really occurred?" Far too often suspicious behavior is attributed to a malicious attacker when in fact no such attack has happened. Of course, sometimes it is obvious that a machine has suffered from a breach of security (for instance, it is spewing out spam and contacting botnet c&c servers). In many cases, however, users will report suspicious behaviour that *could* be attributable to a security incident, but could also be attributable to many other causes. Just because an incident has been reported does not always make it legitimate. Take the time to sort out the grounds upon which an incident report rests. Sometimes you may find that no incident has taken place, and sometimes you may find that although an incident has taken place the investigation is beyond your jurisdiction to investigate or remediate (for instance, a physical theft has occurred or human suspects must be questioned - these are both jobs best left to law enforcement). Be sure to acknowledge when an investigation reaches beyond your ability, as a responder, to pursue.
Once you're sure that an incident has occurred you can begin your investigation. It is critically important that every step of the investigation, from incident report to after action reflections be recorded. Documentation is the invaluable pay off for any incident response. Good documentation can help with developing remediation, sharing lessons-learned, and reviewing performance to enhance future response.
It is incredibly valuable to have a convenient place for documentation, one that is accessible to everyone involved in the investigation and easy to use. If the documentation mechanism isn't easy to use, investigators won't be as dilligent in doing documentation, so chose a mechanism that is simple, approachable, and liked by everyone in the investigation. Documentation schemes vary widely, from commercial tools, wiki's, project management software, forms, to just writing things down in a notebook.
Whatever form you choose, be sure to document consistently and to come back to your documentation when the incident is concluded and write up a final report. Document as much as you can and don't make any assumptions that any details will be recalled from memory when reviewing the documentation later. It is likely that you will forget many details of an investigation, even while it is in progress, so be sure to write every detail down. Nobody has ever complained that documentation was too voluminous.
Begin your documentation with an initial summary of the situation. List any key contacts, the people involved in the investigation, and the reason for the investigation. Write down what factors lead to the investigation and any other details that you might have before you even begin your investigation.
Recreate the Digital Crime Scene
Once you have established that an incident has occurred and you have a mechanism for documentation the first step in investigation is to recreate the digital crime scene, if at all possible. Gather logs, bash history files, IDS reports and any other evidence that could be used to retrace the offender's trail. Be sure to make copies of all this material and store them in a safe place. This evidence will be used to recreate the actions an attacker took.
It may be necessary to acquire forensically sound copies of evidence. Having a forensic copy will preserve the original state of the victim machine. This is beneficial because you can perform testing on a copy without disturbing the original in case you should need to return to it for some reason (for instance, if you accidentally delete something you need on the copy). Making forensic copies is resource intensive and is beyond the scope of this article. Often times I find is is enough to utilize tar to make forensic copies. Even if you don't require forensically sound copies of evidence, it is important to make copies so that you have backups of relevant data to scrutinize and work with. It is usually more convenient to pull copies of log files and move them to a hardware environment where it will be easier to process them. Be sure to work on a system that you know isn't compromised and that an attacker does not have access to. Your investigations will be useless if you're using trojaned tools for your investigation.
Encryption technology can be useful for this process. By taking a copy of files and digitally signing them you can be assured that their contents haven't been altered. Keeping a signed copy of evidence may become handy down the road if an investigation begins to involve law enforcement or if you ever have to return to backups. GPG, the open source encryption technology, is especially useful for this task.
Armed with log files and other digital evidence try to reconstruct the chain of events leading up to the incident and those during and proceeding the incident. Try to gain the clearest possible picture of what happened based on the available evidence. This will be limited by the availability of evidence, but try to be as precise as possible. Can you break out specific IP addresses involved in the attack? If you can somehow segregate out relevant evidence from peripheral evidence do that. You'll be much more apt to draw accurate conclusions if you're looking at small subsets of relevant data rather than all the logs you can get your hands on.
Once you have all the digital evidence that you can gather the real meat of the investigation begins. First review the evidence with as objective a mind as you can. What conclusions does the evidence suggest? Be sure to look for the simplest solutions first. Once you have developed a hypothesis based on the evidence be sure to document it (so you don't repeat work later).
Once a hypothesis has been reached go back to the evidence and see if it supports or controverts the hypothesis. Comparing your hypothesis to the available evidence may show weaknesses in your hypothesis, or illuminate aspects of the evidence that weren't considered before. Be sure to take careful stock of any unexplained aspects of the hypothesis or criteria that the evidence does not support. This may lead you to revise your hypothesis or seek out other evidence.
You may find that your hypothesis is quickly proven wrong, needs to be changed, or leads you to develop a new hypothesis. Document your hypothesis, any changes, and the reasons you came to them. This is critical because you don't want to take a circuitous route back to the same hypothesis and forget that you've already tested and disproven it.
Once you hit upon the most likely hypothesis for the cause of the compromise it is time to turn your attention to the effects of the incident.
Determine the Extent
Determining the extent of an incident is a tricky process. You must carefully measure the exposure of systems or information to the incident. Has an attacker compromised a root account? Did the account that was cracked share a password with other accounts? Did an attacker simply utilize one server as a jumping off point to attack another server? Determining the extent of an incident may force you to instigate new incident responses. Most importantly, this determination will help guide your remediation.
Remediation is the process of repair that you should begin once you're relatively certain what has happened and how affective the incident was. Remediation should include several steps. The most obvious part of remediation is returning a system to normal operational health. Often overlooked, however, is the process of instituting safeguards to prevent future incidents of the same type from occurring. It is extremely helpful if a signature of an attack can be developed that you can use for future incident response, but it is more helpful if you can deploy a solution that prevents the sort of attack from every causing an incident ever again.
Be sure to work inclusively with all concerned parties when deploying remediation. Make stakeholder aware of any changes that will be implemented and help them to understand the cause for the incident.
When deploying a remediation measure be sure that it is effective as well. Sometimes a system may need to be completely reconstituted in order to be sure that any damage is fixed, that the attacker cannot re-enter the system, and that future attacks of the same type may fail. This is an extreme response, but it is always effective. Be wary of taking a minimalist road when repairing systems affected by an incident. The worst case scenario is to spend time and resources repairing a system only to find yourself mired in the same incident a short time later.
If an incident should re-occur after remediation take stock of the situation. Begin your incident response anew. Try to determine what factors were overlooked or weren't remediate that allowed the attack to re-occur. Try not to be frustrated. Almost any remediation should provide some value to a system so your efforts are never completely in vain.
An after action survey is one of the most commonly overlooked steps in an incident response. It is easy to get caught up in the excitement of the detective work and become satisfied after the long haul of deploying a solution and then simply returning to work as usual. Once a remediation is deployed your work is not done on an incident.
Gather everyone involved in the incident and review the documentation that you have produced. Try to find places where you thought your response was particularly effective, and where it was lacking. Document your findings. Your response will become better over time if you take a moment to focus on 'lessons learned' from each incident.
Once you've completed your after action inspections write up a final report. It is very likely that at this point your documentation will be haphazard or incomplete. Take the time to fill in gaps, clean up documentation and craft your report so that is approachable and useful. You may want to share your report with others, or you may come back to it at a later date if a similar incident should occur. Documentation of this sort is invaluable to an organization so make sure you write up a good final report and file it somewhere where others (including yourself after a year or two) can find it - the report is no good if it becomes lost.
Once you write up your final report then, and only then, is it time to return to work as normal. If you complete all of these steps you can insure that over time you'll become a more effective responder and you'll provide value to others, even if you don't necessarily reach the best, or even the correct solution. At the very least there is a record of steps that have already been taken and solutions that were already tried. This saves time and effort and provides a value regardless of the incident's outcome. It can also provide important knowledge if you turn to outside assistance or decide to share your experiences with others.