HR & IT Recruitment Services Remote recruitment & HR services Recruitment subscription About Us Contacts ALL JOBS IT JOBS CV EXAMPLES Our blog 2 Case Studies

Vacancy in : ML Engineer (Large Language model) Salary

Locations: London, Prague, Amsterdam | Remote or Hybrid | Full-time

We are an AI R&D team focused on cutting-edge applied research and building AI-driven products. Our recent work includes:

  • Leveraging test-time guided search for enhanced agent performance

  • Scaling task data collection for reinforcement learning in software engineering agents

  • Optimizing LLM training efficiency using agentic trajectories

One of our flagship projects is an AI inference and fine-tuning platform that supports scalable, fast, and cost-effective deployment of AI models.

We are looking for Senior and Staff-level Machine Learning Engineers with deep expertise in high-performance computing and distributed systems to build and optimize robust, scalable training and inference pipelines for large AI models.


Key Responsibilities:

  • Design and implement large-scale training and inference pipelines (data, tensor, context, expert, pipeline parallelism)

  • Optimize inference performance using advanced techniques like speculative decoding (Medusa, EAGLE, etc.), CUDA Graphs, and compile-based methods

  • Build custom CUDA/Triton kernels for performance-critical operations

  • Collaborate closely with researchers and infrastructure teams to ensure scalability and performance of AI products


Requirements:

  • Strong theoretical background in Machine Learning

  • Deep understanding of training/inference performance optimization for large neural networks (parallelism, attention, batching, offloading, etc.)

  • Expertise in at least one of the following:

    • Writing high-performance custom CUDA/Triton GPU kernels

    • Distributed training and parallelism at scale

    • Inference optimization (paged attention, continuous batching, speculative decoding, etc.)

  • Solid software engineering skills (Python-centric stack)

  • Experience with modern ML frameworks (PyTorch, JAX)

  • Familiarity with CI/CD, version control, and testing best practices

  • Strong communication skills and ability to work independently


Nice to Have:

  • Experience with modern LLM inference stacks (vLLM, SGLang, TensorRT-LLM, Dynamo)

  • Understanding of LLM core concepts (Flash Attention, MoE, RoPE, ZeRO, quantization, etc.)

  • Master’s/PhD in CS, AI, Data Science, or related field

  • Previous experience delivering production-grade products in startup-like environments

  • Experience with complex engineering systems or distributed data pipelines

  • Contributions to open-source projects

Join the Znoydzem community.

Apply as a Specialist