Autonomous Decisioning Research & Engineering
MLOps Engineer
La Jolla, CA / Remote (US)
·
Full-time
About this role
You will build and operate the platform that takes our models from research notebooks to reliable, observable production services. You will define the patterns researchers and engineers use to ship, run, and roll back models with confidence.
What you will do
- Design and operate model serving, batch inference, and feature pipelines.
- Build CI/CD for models — versioning, packaging, canary rollouts, automated rollback.
- Stand up monitoring for accuracy, latency, drift, and cost.
- Maintain reproducible training environments and experiment tracking.
- Partner with security and SRE on incident response for production model failures.
What we are looking for
- 4+ years building production ML or data infrastructure.
- Strong Python; comfort with at least one of Go, Rust, or TypeScript.
- Hands-on with Kubernetes, Docker, Terraform, and a major cloud (AWS, GCP, or Azure).
- Deep understanding of model serving (Triton, BentoML, KServe, vLLM, or similar).
- Track record of carrying production on-call for an ML or data system.
Nice to have
- Experience with feature stores, vector databases, and data versioning (DVC, LakeFS).
- Background in cost optimization for GPU inference workloads.
Apply for this role
All fields marked with * are required.