Head of Platform/AI Cluster Management - System Integrator (San Francisco) Job at Hamilton Barnes Associates Limited, San Francisco, CA

ckNMVzBFRmJhRDBBc3NSam5PWFFhUHY3
  • Hamilton Barnes Associates Limited
  • San Francisco, CA

Job Description

Ready to lead innovation at the intersection of platforms and artificial intelligence?

Join a pioneering technology company driving advancements in cloud, AI, and data-driven solutions across global markets. The organization is recognized for fostering innovation, scalability, and collaboration through cutting-edge platforms that empower enterprises to evolve intelligently.

The team is hiring a Head of Platform/AI Cluster Management to oversee the strategic development, integration, and optimization of AI and platform initiatives. The role will focus on leading cross-functional teams, enhancing performance and scalability, and aligning technology strategy with long-term business goals.

Shape the future of intelligent platforms and transformative innovation. Apply now!

Responsibilities

  • Own the scheduler/runtime layer (Slurm, Kubernetes, Ray), including multi-tenancy, quotas, and GPU/host fleet management.
  • Lead cluster operations across images, CI/CD, repair/health, performance/telemetry, and incident response.
  • Deliver platform services that ensure workload SLOs and reliable runtime execution.
  • Define and implement namespace/tenancy design, node health automation, golden images, admission controls, on-call runbooks, and go-live gates.
  • Collaborate closely with infra, SRE, and network teams to optimize workload placement and cluster efficiency.
  • Provide hands-on expertise in NCCL behaviours, placement strategies, and congestion signal management.

Requirements

  • Deep expertise in cluster management, scheduling, and runtime environments for large-scale compute.
  • Hands-on background with Slurm, Kubernetes, Ray, or similar orchestration platforms.
  • Strong understanding of NCCL performance tuning, workload isolation, and congestion management.
  • Experience scaling multi-tenant, GPU-heavy clusters with strict SLOs.
  • Ability to thrive in a startup environment with full ownership over platform and cluster strategy.

Salary

  • $500,000 gross per year (Negotiable)
#J-18808-Ljbffr

Job Tags

Full time,

Similar Jobs

Themark

Remote Data Analyst Job at Themark

 ...Position: Remote Data Analyst Company Overview: The Themark is a leadingcompany based in California, USA. We specialize in providing...  ...insights and recommendations to our clients. This is a full-time remote position, and the ideal candidate will have a strong background... 

gpac

Waterproofing Project Manager: $70K-$120K Job at gpac

 ...Job Description JOB DESCRIPTION: $70K-$120K URGENTLY SEEKING COMMERCIAL RESTORATION/WATERPROOFING PROJECT MANAGERS & ESTIMATORS GPAC: #1 Restoration Recruiting Firm in the Nation: WORK FOR A LEADING CONTRACTOR I am working with a well-respected, firmly... 

24 Seven Talent

Creative Operations & Project Manager Job at 24 Seven Talent

 ...Job Description Creative Project Manager (Retail / Print / Store Experience) Hybrid 4 days/week onsite | Phoenix, AZ Our client, a national retail organization, is seeking a Creative Project Manager to support a fast-paced internal Creative Team during a period... 

Aulani, A Disney Resort & Spa

Front Desk Agent-Full Time, $35.84/Hour Job at Aulani, A Disney Resort & Spa

Come and join the magic with Aulani, A Disney Resort and Spa! Perks and benefits may include: ~100% full coverage of healthcare for you and your eligible dependents ~ Tuition paid upfront at network schools ~ Free lunch ~ Free parking ~ Free theme park admission...

Wood Agency Life

Entry-Level Remote Sales Job at Wood Agency Life

 ...and unlimited earning potential . Why Join Us? No experience required we provide full training and mentorship Remote work...  ...anywhere in the U.S. Flexible schedule set your own hours High commissions get paid what youre worth Growth potential...