SRE (Site Reliability Engineer) Skills - AWS, Terraform, Kafka, Docker, Kubernetes

4m ago
min 5 years
United Kingdom

AWS, Terraform, Kafka, Docker, Kubernetes and a software dev background

The Streaming SRE (Site Reliability Engineer) role is part of the bank’s Data & Analytics Service function. It helps to deliver the Streaming as a Service platform that is used by multiple teams within the bank for critical and real-time loads.

As a Streaming SRE, the key responsibilities include:

• Design, development, testing and implementation of the Streaming as a Service offering to support our business customers through the software development life cycle.

• Debug production issues across services and levels of the stack.

• Help on planning the growth of the infrastructure; improving system resilience, performance and stability.

• Ensuring consistency of technology usage across a programme, by continuously reviewing existing toolsets and code and suggesting re-use of components.

• Ensuring system SLAs and performance.

Skills needed:

• Software engineering background

• Hands-on experience designing, building, delivering and operating production-grade software at scale and an appreciation for the complex and emergent behaviours inherent to distributed systems

• Production experience with Kafka in high-scale distributed systems

• Strong opinions informed by experience on subjects like continuous delivery, distributed architectures and systems, test strategies, everything-as-code, containerisation, orchestration, cloud services and incident response

• Comfortable having in-depth discussions, troubleshooting and debugging systems and reading/writing code

Job posted by- Rahul Pandey