Why Join Us?
We are leading a new era in cloud computing to support the global AI economy. Our mission is to empower customers to solve real-world challenges and transform industries—without massive infrastructure costs or the need to build large in-house AI teams. Here, you’ll work at the forefront of AI cloud infrastructure alongside some of the most experienced and innovative engineers in the industry.
Where You’ll Work
Headquartered in Amsterdam and listed on Nasdaq, the company operates globally with R&D hubs across Europe, North America, and Israel. Our team of over 800 professionals includes more than 400 highly skilled engineers specializing in hardware, software, and AI research.
About the Role
You will join the AI R&D team focused on applied research and developing AI-powered products. Some of our recent publications include:
Investigating how test-time guided search can create more capable agents.
Scaling task data collection to accelerate reinforcement learning for software engineering agents.
Improving the efficiency of large language model training on agentic trajectories.
A flagship product you will help advance is a platform for inference and fine-tuning AI models.
We are currently seeking senior and staff-level ML engineers to optimize training and inference performance in large-scale, multi-GPU, multi-node environments. This role requires expertise in distributed systems and high-performance computing.
Your Responsibilities
Architect and implement distributed training and inference pipelines leveraging techniques such as data, tensor, context, expert (MoE), and pipeline parallelism.
Implement inference optimization techniques, including speculative decoding and its extensions (Medusa, EAGLE), CUDA graphs, and compile-based optimization.
Develop custom CUDA/Triton kernels for performance-critical neural network layers.
What We’re Looking For
Profound understanding of machine learning theory.
Deep knowledge of the performance aspects of large neural network training and inference (parallelism, offloading, custom kernels, hardware acceleration, dynamic batching).
Expertise in at least one of the following:
Developing efficient GPU kernels in CUDA and/or Triton.
Training large models across multiple nodes with advanced parallelism.
Optimizing inference through techniques such as speculative decoding, paged attention, and continuous batching.
Strong software engineering skills (mainly using Python).
Experience with modern deep learning frameworks (JAX, PyTorch).
Proficiency in contemporary engineering practices (CI/CD, version control, unit testing).
Excellent communication skills and ability to work independently.
Nice to Have
Familiarity with modern LLM inference frameworks (vLLM, SGLang, TensorRT-LLM, Dynamo).
Knowledge of key concepts in the LLM domain (MHA, RoPE, ZeRO/FSDP, Flash Attention, quantization).
Bachelor’s degree in Computer Science, AI, Data Science, or a related field (Master’s or PhD preferred).
Experience delivering products in dynamic, startup-like environments.
Background in engineering complex distributed systems or high-load services.
Open-source contributions demonstrating engineering expertise.
Excellent command of English and superior communication skills.
What We Offer
Competitive salary and a comprehensive benefits package.
Opportunities for professional growth within a rapidly scaling organization.
Hybrid working arrangements.
A dynamic, collaborative environment that values initiative and innovation.
Join the Znoydzem community.
Similar Resumes