System Engineer (HPC)

We are seeking a highly skilled HPC Systems Engineer to design, deploy, and maintain high-performance computing (HPC) clusters supporting advanced research and computational workloads. You will work closely with researchers, developers, and IT teams to ensure optimal performance, reliability, and scalability of HPC infrastructure.

Key Responsibilities

  • Design, deploy, and manage HPC clusters, including hardware, operating systems (primarily Linux), storage, and high-speed interconnects.
  • Perform system administration tasks: installation, configuration, maintenance, upgrades, and troubleshooting of HPC systems and associated software.
  • Monitor system performance, conduct benchmarking, and implement performance tuning and optimization for both hardware and applications.
  • Develop automation scripts and tools (e.g., Python, Bash) for system management, monitoring, and deployment.
  • Administer and optimize batch scheduling and queuing systems (e.g., SLURM, PBS).
  • Collaborate with researchers and end users to understand computational requirements, provide technical support, and optimize workflows.
  • Ensure system security, data integrity, and compliance with organizational policies and best practices.
  • Document system configurations, procedures, and troubleshooting guides for internal and external stakeholders.
  • Evaluate and integrate new technologies to enhance HPC capabilities and efficiency57.
  • Provide tiered user support and training, addressing technical issues and guiding users in effective HPC utilization.

Qualifications

  • Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
  • Proven experience in HPC systems administration and parallel computing environments.
  • Advanced knowledge of Linux system administration and troubleshooting.
  • Experience with HPC job schedulers (SLURM, PBS) and parallel/distributed file systems (e.g., Lustre, GPFS)
  • Proficiency in scripting languages (Python, Bash) for automation and system management.
  • Understanding of HPC networking, storage, and interconnect technologies (e.g., Infiniband)
  • Strong analytical, problem-solving, and communication skills.
  • Ability to work collaboratively within multidisciplinary teams and with external vendors.

Preferred Qualifications

  • Experience supporting large-scale production HPC environments.
  • Familiarity with performance modeling, benchmarking, and application tuning.
  • Knowledge of cloud-based HPC solutions and virtualization technologies.
  • Exposure to scientific computing applications in fields such as bioinformatics, computational chemistry, or engineering simulations.

Benefits

  • Opportunity to work with cutting-edge HPC technologies and contribute to impactful research.
  • Collaborative and innovative team environment.
  • Competitive compensation and professional development opportunities.

Apply now for chat regarding suitability for this role!

Job Type: Full Time
Job Location: Chicago London

Apply for this position

Allowed Type(s): .pdf, .doc, .docx