AWS, Terraform, Kafka, Docker, Kubernetes and a software dev background
The Streaming SRE (Site Reliability Engineer) role is part of the bank’s Data & Analytics Service function. It helps to deliver the Streaming as a Service platform that is used by multiple teams within the bank for critical and real-time loads.
As a Streaming SRE, the key responsibilities include:
• Design, development, testing and implementation of the Streaming as a Service offering to support our business customers through the software development life cycle.
• Debug production issues across services and levels of the stack.
• Help on planning the growth of the infrastructure; improving system resilience, performance and stability.
• Ensuring consistency of technology usage across a programme, by continuously reviewing existing toolsets and code and suggesting re-use of components.
• Ensuring system SLAs and performance.
Skills needed:
• Software engineering background
• Hands-on experience designing, building, delivering and operating production-grade software at scale and an appreciation for the complex and emergent behaviours inherent to distributed systems
• Production experience with Kafka in high-scale distributed systems
• Strong opinions informed by experience on subjects like continuous delivery, distributed architectures and systems, test strategies, everything-as-code, containerisation, orchestration, cloud services and incident response
• Comfortable having in-depth discussions, troubleshooting and debugging systems and reading/writing code