Vulnerability Management is a space that encompasses a wide variety of products from vulnerability scanners to patch management products and configuration management suites. When I think of vulnerability management I’m talking about the whole lifecycle of a vulnerability from the time a vulnerability is discovered by a security researcher until I remediate it on my affected systems and everything that happens in between. Realistically as an internal security person this process starts with the advisory from your product vendor be it Microsoft, Adobe, HP or whoever or perhaps as a result from a security assessment. The problem is vulnerability management vendors have lied to you and their marketing has proliferated a number of issues that leads to absolute failure in this space. So lets take a look at that lifecycle, examine the phases and see how failure creeps in and what we can do to streamline this process.
Before we can begin this process we have to start with a basic understanding of the organization’s risk tolerance and create a baseline. We identify what we have and why it’s important. I’m talking about inventory. You probably track your capital assets because Finance or Material Control or Asset Management or whatever your organization calls it requires you to. That’s not terribly helpful because that inventory doesn’t typically capture what we need, nor should it since it’s driven by a completely different process with different objectives. As information security workers we are not terribly concerned with standard physical loss scenarios or software piracy issues for COTS purchased applications unless it also means a loss of our own intellectual property, breach of information we are safeguarded to protect, etc. Those scenarios are problematic to be sure, but not our primary concern. Here are some basic things we need to capture for our security based inventory:
- What is the asset? – We need to capture hardware, software, data and some unique identifier. This is not limited to capital assets.
- What is the function of the asset? – We need to understand what it does so we can begin to map out how compromises impact other piece parts of our organization whether that is a direct business impact or security degradation for other systems.
- Where does it live? – We need to understand how it fits into our overall architecture, what it’s connected to and what kinds of trust relationships exist. Without this information the next piece of information is hard to determine.
- Why is it important to the organization? – If we don’t know why it’s important it’s going to be really hard to assign value to the asset and prioritize resources. This is really important because as we move into and past the vulnerability scanning phase we have to be able to prioritize where we need to fix problems first. The use of quantifiable metrics here can be hugely valuable but most vulnerability scanners do not have the granularity to be of much help. nCircle is the only one I’ve seen that does but the complexity of an nCircle install, the licensing hell that is IP360 and CCM and whatever other recently purchased cruft they are trying to shoehorn in along with the extremely fragmented codebase of their product makes this a pretty poor strategic choice for most organizations.
Now that we have a good understanding of the assets we are trying to protect we should be in a standard operational mode. I’m not going to go into which specific controls you should have in place here as that is completely subjective and is outside the scope of this article. We may be starting a vulnerability management program from scratch or you may have an existing program already in place. The basic process is the same, though it is a far more daunting when first starting this process. The problem though is none of this is being captured in our “Vulnerability Management” tool and probably lives in an inventory database, Excel spreadsheet or possibly a GRC tool that integrates with your vuln scanner. If you are fortunate enough to have a product like RedSeal or Skybox you have a much more complete picture here that really works with your vulnerability management lifecycle but these products are not cheap to procure or manage.
Identification of Vulnerabilities
This may be accomplished by reviewing a security advisory from a vendor and then assessing which systems are vulnerable. More frequently though we use automated tools like Nessus, NeXpose, Qualys and others to scan our systems and tell us where the problems are. We hire penetration testers to test our systems and pray that we get a useful report at the end of the engagement that can help us through this process. We may be using passive vulnerability detection methods or a number of other ways we discover there is a problem. I’m not going to delve too deeply into issues with false positives as that’s deserving of it’s own post but I’ve seen a lot of organizations spin their wheels on “vulnerabilities” that wound up being completely irrelevant due to contextual data paths or even a false reported vulnerability. The issue though is all of this is not enough. We have to actually fix stuff.
Classification of Vulnerabilities
Before we know where to start, we have to classify the severity of the vulnerability. The typical formula looks something like some calculation of a sliding scale of impact and likelihood as we try to create a risk matrix for the vulnerability.
The problem is this is really only useful for showing pretty pictures to management. This has to be mapped to contextual risk for the environment and this is really hard to do. If we go back to our initial inventory process we should have achieved some sort of valuation for the assets. Many scanners such as NeXpose and nCircle allow you to assign a value to the asset and factor those values into CVSS risk scoring to get a weighted risk value. This is actually quite useful if you take the time to set this up. This can be difficult and time consuming if you are doing this across your enterprise. This is where the process starts to break down. Again, tools like RedSeal really start painting a clearer picture here as you can map out data paths and start creating “what if” scenarios that demonstrate how the vulnerability landscape changes with a single patch or IPSEC rule. But why do I have to buy an expensive tool on top of my “vulnerability management” solution? Because it’s a marketing misnomer and complete fail.
Remediation / Mitigation of Vulnerabilities
This is where the rubber really meets the road. Until now it’s just been pretty reports, but especially if your vulnerability management teams are independent from operational groups it can be really tough to pull this stuff off. On top of this, how do you know when it’s been fixed and can move on to the next item to be fixed? If you are using canned remediation reports you probably have a huge PDF report telling you it’s going to take 523 hours to fix server 1 out of 300 servers. Not going to happen.
Ideally the process flow should look something like:
The problem is it tends to look more like this:
Ops doesn’t care what you want them to do. They are there to keep the lights on and they are probably understaffed, putting out fires and falling further behind on the projects that management has tasked them with. You as a security person are an obstacle. They’d really like to get on with their work and will make every excuse possible to NOT fix the security holes in that server.
- “But you said it’s not an OWASP Top 10!”
- “Your fix will break my server”
- “We need to test the change first” (but you have no change management processes for that function)
- “I’ll get to it next maintenance window” (but they never do)
- “You are wrong. there’s nothing wrong with the server”
- They go complain to management that you are the reason why projects miss the time constraint and to get you off their backs
- *SNARL!!!* “Get out of my way!”
So now you spend most of your time setting up meetings with management explaining why this needs to be patched and operational staff need to re-prioritize workloads. But there are over 200,000 vulnerabilities in your environment. How feasible is it to do this that many times, or even 2,000 times? It’s no wonder our environments are so broken. Security is not a top down driven process and only the squeaky wheels get fixed so you spend all your time trying to create noise where it’s needed instead of helping your teams resolve issues. Hence the alcoholism in this industry. Even if your operations group is also performing scanning, you likely run into similar time constraints and the issue remains the same. Broken systems just waiting to get pwned.
How do we fix this?
Our tools are failing us because they do not consider the life-cycle of vulnerability management. Our vulnerability scanning vendor’s marketing efforts have succeeded in a couple of things.
1. They sell lots of licenses and make tons of money
2. They show us just how fucked we are
We have to address this in a few key areas:
- We need a way to prioritize the vulnerabilities. Asset valuation, contextual awareness for pivot paths and trust relationships is critical. Very few organizations properly map these data flows or understand how a low value compromised system such as a desktop can lead to far greater impacts on core production servers. I find the OSSTMM focus on threat surface metrics and the associated RAV scoring is hugely beneficial here.
- We need a way to escalate from vulnerability detection to work actually being performed, A linkage between vulnerability scanning products and workorder / service desk applications seems the most obvious way to accomplish this. While some scanning vendors have made a feeble attempt to accommodate, if you are not running Remedy or prepared to start diving into API’s and customization mechanisms, your options are extremely limited. Identification of impact in the request will go a long way to easing the transition from work requested to work performed. Obviously there are still cultural battles that may need to be fought here as tools will only take us so far.
- Once work is performed, it would be helpful to be able to reference that back in the vulnerability scanning tools. A completed workorder should automatically trigger a re-scan, or better yet a re-scan should be performed before the workorder is considered complete. There should be historical records of remediation done on servers and hours tracked to better understand the costs associated with managing the server. A legacy server that is not being replaced due to costs but has to be constantly re-configured due to security failures may be justification enough to upgrade or eliminate the asset altogether. These work metrics are also useful in quantifying costs for security incidents as well.
- Lastly, all of this information needs to be referenced in whatever serves as a Computer Security Incident Tracking database. When analysts are working an incident, it is important they understand the security state of the assets involved in the incident. Scans should be data sources for your SIEM tools to provide additional contextual correlation for detected events.
Vulnerability Management is hard work and if all you are doing is sending monthly scans to your server team and hoping they address findings you will be in for a world of hurt. It requires coordination and most importantly a shift in mindset prioritizing these activities as if they were service interruptions. Because the day will come when you will wish they were.