AI Research Engineer (Model Compression & Quantization) — Tether Operations
CHF 73'500 - 111'500
Tether Operations · Zürich, Zürich (ZH)
- Location
- Zürich
- Contract
- remote
- Posted
- 31 days ago
SalaryCHF 73'500 - 111'500
Role overview
Join Tether and Shape the Future of Digital Finance At Tether, we’re not just building products, we’re pioneering a global financial revolution.
Our cutting-edge solutions empower businesses—from exchanges and wallets to payment processors and ATMs—to seamlessly integrate reserve-backed tokens across blockchains.
By harnessing the power of blockchain technology, Tether enables you to store, send, and receive digital tokens instantly, securely, and globally, all at a fraction of the cost.
- Join Tether and Shape the Future of Digital Finance At Tether, we’re not just building products, we’re pioneering a global financial revolution.
- Our cutting-edge solutions empower businesses—from exchanges and wallets to payment processors and ATMs—to seamlessly integrate reserve-backed tokens across blockchains.
- Your responsibilities include building robust compression pipelines, establishing performance and fidelity metrics, and addressing bottlenecks in production inference.
- The ultimate goal is to deliver scalable, low-memory, low-latency AI systems on edge devices (i.e., smartphones) that maintain high fidelity and tangible real-world value.
Main responsibilities
- Your responsibilities include building robust compression pipelines, establishing performance and fidelity metrics, and addressing bottlenecks in production inference.
- The ultimate goal is to deliver scalable, low-memory, low-latency AI systems on edge devices (i.e., smartphones) that maintain high fidelity and tangible real-world value.
- Break down the key responsibilities in bullet points.
- It’s helpful to make them actionable and measurable.
- This could also be grouped into categories for more complex roles.
Application process
- You will apply and advance compression techniques such as quantization, knowledge distillation, and pruning to streamline complex multimodal architectures that integrate text, images, and audio.
- low-bit quantization to reduce model size and inference latency for generative AI models (LLMs, VLMs, multimodal) while maintaining accuracy and output quality.
- Leverage knowledge distillation to transfer capabilities from larger teacher models to smaller student models, enabling efficient multimodal reasoning across text, image, and audio inputs.
- Implement pruning techniques to remove redundant parameters and attention heads, reducing computational overhead without sacrificing task performance.
- Analyze trade-offs between model efficiency (size, latency, memory) and accuracy across quantization, distillation, and pruning methods; propose improvements based on empirical findings.
- Research and apply mixed-precision quantization and other advanced compression strategies (e.g., adaptive pruning schedules, distillation with intermediate feature matching) to optimize the accuracy–performance balance.
- Stay current with the latest research in model compression, including emerging techniques for multimodal and generative architectures.
- Document methodologies, experiments, and results clearly to support reproducibility, internal collaboration, and stakeholder communication.
Contacts
- Double-check email addresses.
Additional details
- Responsibilities Break down the key responsibilities in bullet points.
- Apply low-bit quantization to reduce model size and inference latency for generative AI models (LLMs, VLMs, multimodal) while maintaining accuracy and output quality.
- You can split this into mandatory and preferred sections if needed. A degree in Computer Science or related field.
- Apply only through our official channels.
Notes and original content
- You can split this into mandatory and preferred sections if needed.
- A degree in Computer Science or related field.