Job description
Job Description
- Make a proposal of AI solution in align with a set of customer requirements and goal.
- Build and lead a team, enhancing overall technical capabilities and performance.
- Optimize Large Language Models (LLMs) & AI models, including:
- Efficient training of LLMs (DeepSpeed, FSDP, LoRA)
- Deploying models with Kubernetes, Ray, Triton Inference Server
- Optimizing model inference speed with ONNX, TensorRT, GGUF, vLLM
- Implementing Retrieval-Augmented Generation (RAG) pipelines
- Applying AI distillation and quantization techniques
- Work with HPC infrastructure and distributed AI computing.
- Implement system monitoring tools (htop, tcpdump, iostat, netstat).
- Troubleshoot AI system performance bottlenecks.
Job requirements
Requirements
- Bachelor’s or Master’s degree in AI, Computer Science, Machine Learning or a related field.
- 3+ years of experience in LLM model development and optimization.
- Hands-on experience with distributed AI training and HPC for AI workloads.
- Expertise in GPU acceleration (CUDA, TensorRT, vLLM).
- Deep understanding of LLM architectures (GPT, Llama, Falcon, T5, Mistral).
- Experience in cloud AI deployment (Kubernetes, OpenStack, Ray, Triton).
- Strong ability to troubleshoot system errors and optimize AI workloads.
- English communication, reading, writing professional
How to apply
After application screening, the next step will be a telephone interview with a member of our HR team. If successful, the final stage is face-to-face interview that will take place in our office.