Exploit Resolves Incident

bletchley punk @alicegoldfuss: ask the Doc about a cloud backdoor he found during a datacenter outage. one of my fave incidents. tweet Nicholas Valler: Ha! That was my first major incident at New Relic.

I was hired as a forward SRE for the alerts team. As part of my onboarding, I was poking around the systems the team used to host the nascent alert product.

I found a security hole and was in the process of figuring out how to report to security when the the network control plane was accidentally sheared, in particular leaving all our systems inaccessible by ssh.

Mind you, this was my first incident ever, I was watching Alice masterfully lead the response, but I had zero idea what was going on and was too new to really contribute.

After a couple hours, we were getting to a critical point where the Kafka topics were nearing their end of capacity. Folks were desperately trying to find a way to extend the data retention.

Then I remembered the security hole. Turns out, we used a hand rolled NAT gateway in AWS and left port 22 open by default to the world.

I mentioned it to Dana and Alice very timidly: “I think I may know a way into our systems…”. I was pretty nervous because I figured security would be peeved.

Alice gave the go-ahead, and in a few minutes we had developers logged in and extending our Kafka retention.