Site Reliability Engineer

Salary: $300k - $600k

Locations: New York
Job Type: Full Time
Job Category: Infrastructure

Description

Join a core engineering group as Lead Site Reliability Engineer, designing and scaling Linux platforms that underpin ML/AI-driven trading. You will architect and own reliability for massive simulation, HPC, and production workloads—ensuring ultra-reliable, ultra-fast trading systems. This is a hands-on, leadership role focused equally on technical depth, strategic decision-making, and driving platform SRE excellence.

Key Responsibilities

Lead SRE practices for Linux platforms powering low-latency, high-throughput trading workloads.
Architect, optimize, and tune Linux for performance, resilience, and minimal latency.
Drive incident response, root cause analysis, and continuous reliability improvement across production systems.
Oversee system automation and reproducibility—build, deploy, and fleet-manage bare-metal Linux and containerized stacks.
Manage and enhance Kubernetes clusters, network configuration, and large-scale orchestration.
Set observability standards; expand monitoring, alerting, and performance metrics across platforms.
Analyze networking, kernel-level performance, and distributed systems—solving core challenges in a multi-petabyte, multi-cluster environment.
Build Python tools for automation, reliability engineering, and performance analysis.
Design highly distributed systems

Required Skills

Lead SRE practices for Linux platforms powering low-latency, high-throughput trading workloads.
Architect, optimize, and tune Linux for performance, resilience, and minimal latency.
Drive incident response, root cause analysis, and continuous reliability improvement across production systems.
Oversee system automation and reproducibility—build, deploy, and fleet-manage bare-metal Linux and containerized stacks.
Manage and enhance Kubernetes clusters, network configuration, and large-scale orchestration.
Set observability standards; expand monitoring, alerting, and performance metrics across platforms.
Analyze networking, kernel-level performance, and distributed systems—solving core challenges in a multi-petabyte, multi-cluster environment.
Build Python tools for automation, reliability engineering, and performance analysis.
Design highly distributed systems

Preferred Qualifications

The ideal candidate comes from a top-tier tech environment (FAANG, elite trading, hyperscale infra). They have experience building technology 0→1, owning systems end-to-end, and working close to the metal. They will operate across everything from bare-metal Linux to modern build and observability stacks.

Deep Linux, Scripting – Python, DevOps, Kubernetes

Apply Today

Thank you for your interest in this opportunity. Please complete the form below and upload any relevant documents. A member of our team will review your application and be in touch soon.

Blockchain

AI/Machine Learning

Software

Network & Infrastructure

Site Reliability Engineer

Description

Key Responsibilities

Required Skills

Preferred Qualifications

Apply Today

Follow Us

Jobs

Key Markets

Blockchain

AI/Machine Learning

Software

Network & Infrastructure

Our Markets

Blockchain

AI/Machine Learning

Software

Network & Infrastructure

Site Reliability Engineer

Description

Key Responsibilities

Required Skills

Preferred Qualifications

Apply Today

Our Markets

Blockchain

AI/Machine Learning

Software

Network & Infrastructure