AI Infrastructure Engineer — Swissquote
- Location
- Gland
- Contract
- full-time
- Posted
- 27 days ago
Role overview
You will join the IT Department’s IT Platform Operations team, whose role is to operate the layer between raw infrastructure and the bank’s corporate-facing services: the application-tier middleware fabric, the Kubernetes control plane, and the user-facing surface of the bank’s Sovereign AI Platform.
The ideal candidate will possess deep expertise in operating Kubernetes-native platform engineering systems at scale, and will lead the integration of open-source AI tooling within a regulated corporate environment while ensuring large language model (LLM) inference scales.
Your expertise will help your team deliver the platform on which the bank provides governed access to internal and external AI capabilities — distributed inference, agentic workflows, notebooks, and chatbots — built on top of the GPU and serving substrate provided by the Systems & Storage teams.
With your team, you will work closely with IT Architects, Observability & Performance Analysts, the Cybersecurity function and the Systems teams to plan and execute the department’s long-term objective of a sovereign AI capability that runs under the bank’s own governance — data sovereignty, content safety, prompt-injection defenses, agentic-workflow audit, and cost control on external API spend — and that is AI Act- and DORA-ready by design.
- Design, deploy and operate distributed LLM inference (LLM-d) on Kubernetes — sizing for throughput, tail latency and GPU utilisation against the serving substrate provided by IT Systems Services (ITSS).
- Operate and harden the user-facing AI surface: the Open WebUI cross-department chatbot, JupyterHub notebooks for data scientists, and the agent catalog (agentregistry).
- Build and operate Agentgateway as the governed routing layer to external providers (Anthropic Claude API, OpenAI GPT API), enforcing traffic policy, rate limiting, cost controls and audit logging.
- Implement content-safety, prompt-injection defense and agentic-workflow audit controls, plus the agent-identity model required for EU AI Act and DORA compliance.
- Operate the Kubernetes control plane — etcd, API server, scheduler and controller-manager — with HA sizing and surge-upgrade discipline; contribute to multi-cluster management for the meshed cross-cluster pattern.
- Define SLOs and instrument the platform for performance and availability; lead incident response across the AI platform and control-plane critical path.
Application process
- You will join the IT Department’s IT Platform Operations team, whose role is to operate the layer between raw infrastructure and the bank’s corporate-facing services: the application-tier middleware fabric, the Kubernetes control plane, and the user-facing surface of the bank’s Sovereign AI Platform.
- The ideal candidate will possess deep expertise in operating Kubernetes-native platform engineering systems at scale, and will lead the integration of open-source AI tooling within a regulated corporate environment while ensuring large language model (LLM) inference scales.
- Your expertise will help your team deliver the platform on which the bank provides governed access to internal and external AI capabilities — distributed inference, agentic workflows, notebooks, and chatbots — built on top of the GPU and serving substrate provided by the Systems & Storage teams.
- With your team, you will work closely with IT Architects, Observability & Performance Analysts, the Cybersecurity function and the Systems teams to plan and execute the department’s long-term objective of a sovereign AI capability that runs under the bank’s own governance — data sovereignty, content safety, prompt-injection defenses, agentic-workflow audit, and cost control on external API spend — and that is AI Act- and DORA-ready by design.
- Design, deploy and operate distributed LLM inference (LLM-d) on Kubernetes — sizing for throughput, tail latency and GPU utilisation against the serving substrate provided by IT Systems Services (ITSS).
- Operate and harden the user-facing AI surface: the Open WebUI cross-department chatbot, JupyterHub notebooks for data scientists, and the agent catalog (agentregistry).
- Build and operate Agentgateway as the governed routing layer to external providers (Anthropic Claude API, OpenAI GPT API), enforcing traffic policy, rate limiting, cost controls and audit logging.
- Implement content-safety, prompt-injection defense and agentic-workflow audit controls, plus the agent-identity model required for EU AI Act and DORA compliance.
Additional details
- Develop and maintain architecture documentation and operational runbooks, and participate in the 24×7 on-call rotation. Minimum Qualifications
- Excellent interpersonal skills, capable of working with multi-functional technical and business teams, along with different levels of management to influence decision making.
- Comfortable with Infrastructure as Code and governed automation tooling (Ansible / AAP, Terraform, etc.); familiarity with event streaming (Apache Kafka) and observability stacks.
Notes and original content
- Develop and maintain architecture documentation and operational runbooks, and participate in the 24×7 on-call rotation.
- Minimum Qualifications