How to introduce Auto Remediations into your cloud strategy?


“How do we get started with auto-remediation in the cloud?”

This is a question I get asked a lot. Considering everything that could go wrong in the absence of a well thought out remediation strategy – that’s a great question !!!

My response is usually “It depends on your organization’s maturity curve in the cloud”. I tell this to our customers as well.

Auto-remediation is not something that you would start from day one. As an enterprise starting your cloud security journey, you should first establish a baseline for your cloud infrastructure against industry best practices. The cloud infrastructure should be assessed, threats identified, and risks and violations of best practices corrected before embarking on a cloud remediation strategy.

Your engineers and DevOps or DevSecOps teams should be familiar with the nuances associated with most of the key resources in the cloud – mainly Compute, Storage, IAM, and Networks and the processes to follow when fixing issues identified for those resources.

Getting Started with Auto Remediations

Though the impulse would be to try out Auto Remediation for one of the major cloud threats, like someone opening up your firewall to the internet; in reality, you wouldn’t want an automated rule going ahead and making changes to firewall rules in your production environment.

While this is an ideal state as far as incident response is concerned, you should reach there in a phased manner and that’s a path that the entire IT Security team in an organization should be happy to thread.

Some good examples

Tagging or the lack of tags while creating cloud resources is a great place to introduce remediations. It is a must-have strategy for any organization in terms of impact to automation, data classification, segmentation etc., and one of the main areas that organizations struggle with – especially enforcing your organization’s tagging guidelines on resources that are created.

You could start with an Auto Remediation that helps identify untagged assets in your cloud and enforce your organizational tagging policies – like adding an ORG tag, an OWNER tag etc.

Most of the remediation scenarios are closely tied to business use cases, but some generic examples are :

Compute

  • Stop developers from spinning up infrastructure in unapproved regions
  • This is a good security and operational use case - to detect server launches in areas that the organization has blacklisted.
  • As part of  automated remediation, you could email the developer who spun up the server instance citing a company policy violation and also shut down the server immediately or after a time interval.
  • Terminate instances that are exposed to the Public Internet
  • Good example to reduce your threat surface, but thread with caution. These are extremely useful remediations if you have a locked down Production Environment and do not want unauthorized changes happening in the platform.

Storage

  • Encrypt storage volumes that contain sensitive information
  • Block public access to storage buckets

Identity and Access Governance

  • Remove unused permissions from IAM Users
  • Enable server access logging on buckets that have a sensitive tag attached.
  • Remove users from a Group

Using Auto-Remediations as part of a Workflow

As your team gets accustomed to using automation and tracking changes to your environment that are induced by automation – you can then go to the next step, start enforcing it to more sensitive and critical changes in your cloud environment.

Using Auto Remediations with a workflow solution helps you address both the security governance and the operational side of incident management.

For example – your Auto Remediation titled “Terminate instances that are exposed to the Public Internet” could be part of a workflow where the following steps could be executed in sequence

  • Email a list of administrators about the incident
  • Shutdown the instance
  • Create a ServiceNow ticket to keep track of this incident.

Automations like the example cited above can be lifesavers in many situations, reducing your MTTR (Mean Time To Respond) and making the SOC team more efficient.

This is also a core area where Playbooks from C3M can help you. Playbooks is our Cloud SOAR Module (Cloud Security Orchestration Automation & Response) and helps customers with real time incident response for cloud related issues. To learn more about C3M Playbooks, read our earlier blog here.

Resources