Back to all positions
Senior Machine Learning Engineer
Engineering
Boston, MA
Full-time
$220,000 - $320,000Team: ML Systems
About the Role
Regenerative AI builds self-adaptive AI systems that continuously monitor, regulate, and improve their own behavior during real-world operation. We are looking for a Senior Machine Learning Engineer to join our ML Systems team in Boston. You will design, build, and operate production ML systems with built-in feedback loops, runtime monitoring, and automated recalibration. This is a hands-on engineering role focused on system-level robustness, reliability, and continuous adaptation in deployment.
What You'll Do
- Design and implement end-to-end ML pipelines: data ingestion, training, evaluation, and serving
- Build and maintain self-adaptive ML systems with runtime monitoring and drift detection
- Develop automated recalibration workflows that respond to performance degradation
- Create observability infrastructure for model health, latency, and prediction quality
- Own incident response and reliability for ML services in production
- Collaborate with platform engineers to integrate ML workloads into CI/CD pipelines
- Establish and enforce governance standards for model versioning, rollback, and auditability
- Optimize training and inference infrastructure for efficiency and cost
- Contribute to internal tooling that accelerates ML development velocity
Qualifications
- 5+ years of experience building and deploying ML systems in production environments
- Strong proficiency in Python and modern ML frameworks (PyTorch, TensorFlow, or JAX)
- Hands-on experience with distributed training and GPU infrastructure
- Deep understanding of MLOps practices: CI/CD for ML, model versioning, feature stores
- Experience with monitoring and observability tools (Prometheus, Grafana, Datadog, or similar)
- Solid understanding of data pipelines and streaming systems (Kafka, Spark, Airflow)
- Track record of owning and operating ML services with high availability requirements
- Strong software engineering fundamentals and code quality standards
Nice to Have
- Experience with online learning, adaptive models, or feedback-driven systems
- Background in model monitoring, drift detection, or anomaly detection
- Familiarity with control systems concepts or adaptive control
- Experience with Kubernetes, Docker, and cloud-native ML infrastructure
- Knowledge of reliability engineering practices (SLOs, error budgets, on-call)
- Experience building internal ML platforms or developer tools
Benefits & Perks
Competitive salary and meaningful equity
Comprehensive health, dental, and vision insurance
Flexible PTO and remote-friendly work arrangements
Annual learning and development budget ($5,000)
Home office setup allowance
401(k) with company match
Modern tech stack: Python, PyTorch, Docker, Kubernetes, cloud (AWS/GCP)