- Salary: $250k - $500k
- Locations: Chicago
- Job Type: Full Time
- Job Category: Infrastructure
Description
We are working with a leading technology-driven trading firm to hire a Team Lead SRE to drive reliability, scalability, and performance across mission-critical systems. This role is ideal for engineers coming from Big Tech environments who are passionate about building resilient infrastructure and leading high-performing teams in a low-latency, high-availability setting.
Key Responsibilities
-
Lead and mentor a team of Site Reliability Engineers responsible for the uptime, performance, and scalability of production systems.
-
Define and implement SRE best practices, including SLIs, SLOs, error budgets, and incident management frameworks.
-
Own production reliability across trading and research platforms, ensuring systems operate with minimal latency and maximum availability.
-
Partner with software engineering, infrastructure, and trading teams to improve system design, observability, and operational excellence.
-
Drive automation initiatives to reduce toil, improve deployment pipelines, and enhance system self-healing capabilities.
-
Lead incident response, postmortems, and continuous improvement efforts across the platform.
Required Skills
-
Proven experience in a Site Reliability Engineering or Production Engineering role, ideally within a large-scale or Big Tech environment.
-
Strong programming skills in languages such as Python, Go, or Java.
-
Deep understanding of distributed systems, networking, and systems architecture.
-
Experience with observability tooling (e.g., Prometheus, Grafana, OpenTelemetry) and incident management practices.
-
Hands-on experience with cloud platforms (AWS, GCP, or Azure) and container orchestration (Kubernetes).
-
Track record of leading teams or mentoring engineers in high-performance environments.
-
Strong troubleshooting skills and the ability to operate effectively under pressure.
Preferred Qualifications
-
Background in low-latency systems and high-performance environments.
-
Experience with CI/CD systems, infrastructure as code, and large-scale production environments.
-
Familiarity with Linux internals, networking protocols, and performance tuning.
-
Prior experience managing on-call rotations and improving operational maturity.
Apply Today
Thank you for your interest in this opportunity. Please complete the form below and upload any relevant documents. A member of our team will review your application and be in touch soon.