The goal is simple: design an AI agent that writes and optimizes kernels in the same way you do. You will collaborate with the training team to define robust evaluation, validation, and reward models that will be used to train LLMs in the art of GPU kernel engineering. You will also contribute to the AI agent architecture itself, defining the workflows that enable an LLM to discover and implement high performance GPU kernels.

This job is based in either Gdansk or New York City. Remote work will be considered for exceptional candidates.

About Makora

Makora is a venture-backed AI lab building building tools to automate algorithm discovery and GPU performance engineering. There are two core components:

MakoraGenerate writes GPU kernels in CUDA, HIP, and Triton using LLMs

MakoraOptimize automatically selects and swaps GPU kernels in combination with tuning inference engine (vLLM, SGlang, etc..) hyperparameters to optimize performance

Responsibilities

Explore and analyze performance bottlenecks in ML training and inference.

Develop and optimize high-performance computing kernels in Triton, CUDA, and/or ROCm.

Implement programming solutions in C/C++ and Python.

Deep dive into GPU performance optimizations to maximize efficiency and speed.

Collaborate with the team to extend and improve existing machine learning compilers or frameworks such as MLIR, Pytorch, Tensorflow, ONNX Runtime, TensorRT. (This is optional but beneficial)

Qualifications

Bachelor's, Master’s or PhD’s degree in Computer Science, Electrical Engineering, or a related field.

Strong programming skills in C/C++ and Python.

Deep understanding and experience in GPU performance optimizations.

Proven experience with kernel optimizations on CUDA, ROCm, or other accelerators.

General experience with the training and deployment of ML models

Experience with distributed systems development or distributed ML workloads

Bonus Points

Experience with innovative OSS projects like FlashAttention, mlc-llm, vllm, SGLang.

Experience with machine learning compilers or frameworks such as TVM, MLIR, Pytorch, Tensorflow, ONNX Runtime, TensorRT.

Our Benefits

Competitive salary

Incredibly generous equity grants

Comprehensive health insurance coverage for you and your family

Remote work option for exceptional candidates

Generous vacation and paid time off policy

Modern and comfortable work environment with state-of-the-art equipment and facilities

To Apply

Fill out this form

See more open positions at Makora (Formerly Mako)

Powered by Getro.com

Privacy policy Cookie policy

Join our constellation

🌅

GPU Kernel Engineer

Summary