Senior Solutions Architect, HPC and AI — NVIDIA (ufficio Zurich)
NewCHF 101'500 - 154'000
NVIDIA (ufficio Zurich) · Zürich (ZH)
- Location
- Zürich
- Contract
- full-time
- Posted
- 4 days ago
SalaryCHF 101'500 - 154'000
Role overview
We are seeking a Senior Solutions Architect with strong hands-on experience in deploying, debugging, and optimizing training and inference workloads on large-scale GPU clusters.
As we support customers and partners across Europe in training models on ground breaking GPU infrastructure, we are looking for someone who enjoys solving complex challenges at the intersection of High Performance Computing and AI.
Similarly, inference is increasing in its complexity with explosion of MOE models and disaggregated execution making inference truly a HPC workload.
- We are seeking a Senior Solutions Architect with strong hands-on experience in deploying, debugging, and optimizing training and inference workloads on large-scale GPU clusters.
- As we support customers and partners across Europe in training models on ground breaking GPU infrastructure, we are looking for someone who enjoys solving complex challenges at the intersection of High Performance Computing and AI.
Application process
- Practical experience identifying and resolving bottlenecks in large-scale training workloads or parallel applications.
- Hands-on experienced in profiling and debugging large parallel applications.
- Solid understanding of CPU and GPU architectures, CUDA, parallel filesystems, and high-speed interconnects.
- Experienced in working with large compute clusters with an understanding of their internal scheduling and resource management mechanisms (e.g.
- SLURM or Cloud based clusters).
- Proficient knowledge of training pipelines and frameworks, encompassing their internal operations and performance attributes. Ways To Stand Out From The Crowd:
- Experience in debugging training pipelines running on thousands of GPUs in production environment.
- Hands-on experience with performance profiling and optimizations using tools like Nsight Systems, Nsight Compute and good understanding of NCCL, MPI and low-level communication libraries.
Additional details
- If you can demonstrate hands-on experience, we would love to hear from you. What You’ll Be Doing:
- Contributing to Europe’s Sovereign AI initiative by helping customers implement advanced resiliency features within AI training pipelines. What We Need To See:
- Proficient knowledge of training pipelines and frameworks, encompassing their internal operations and performance attributes. Ways To Stand Out From The Crowd:
Notes and original content
- If you can demonstrate hands-on experience, we would love to hear from you.
- What You’ll Be Doing:
- Contributing to Europe’s Sovereign AI initiative by helping customers implement advanced resiliency features within AI training pipelines.
- What We Need To See:
- Proficient knowledge of training pipelines and frameworks, encompassing their internal operations and performance attributes.
- Ways To Stand Out From The Crowd: