Senior Machine Learning Engineer

Engineering

Boston, MA

Full-time

$220,000 - $320,000Team: ML Systems

About the Role

Regenerative AI builds self-adaptive AI systems that continuously monitor, regulate, and improve their own behavior during real-world operation. We are looking for a Senior Machine Learning Engineer to join our ML Systems team in Boston. You will design, build, and operate production ML systems with built-in feedback loops, runtime monitoring, and automated recalibration. This is a hands-on engineering role focused on system-level robustness, reliability, and continuous adaptation in deployment.

What You'll Do

Design and implement end-to-end ML pipelines: data ingestion, training, evaluation, and serving
Build and maintain self-adaptive ML systems with runtime monitoring and drift detection
Develop automated recalibration workflows that respond to performance degradation
Create observability infrastructure for model health, latency, and prediction quality
Own incident response and reliability for ML services in production
Collaborate with platform engineers to integrate ML workloads into CI/CD pipelines
Establish and enforce governance standards for model versioning, rollback, and auditability
Optimize training and inference infrastructure for efficiency and cost
Contribute to internal tooling that accelerates ML development velocity

Qualifications

5+ years of experience building and deploying ML systems in production environments
Strong proficiency in Python and modern ML frameworks (PyTorch, TensorFlow, or JAX)
Hands-on experience with distributed training and GPU infrastructure
Deep understanding of MLOps practices: CI/CD for ML, model versioning, feature stores
Experience with monitoring and observability tools (Prometheus, Grafana, Datadog, or similar)
Solid understanding of data pipelines and streaming systems (Kafka, Spark, Airflow)
Track record of owning and operating ML services with high availability requirements
Strong software engineering fundamentals and code quality standards

Nice to Have

Experience with online learning, adaptive models, or feedback-driven systems
Background in model monitoring, drift detection, or anomaly detection
Familiarity with control systems concepts or adaptive control
Experience with Kubernetes, Docker, and cloud-native ML infrastructure
Knowledge of reliability engineering practices (SLOs, error budgets, on-call)
Experience building internal ML platforms or developer tools

Benefits & Perks

Competitive salary and meaningful equity

Comprehensive health, dental, and vision insurance

Flexible PTO and remote-friendly work arrangements

Annual learning and development budget ($5,000)

Home office setup allowance

401(k) with company match

Modern tech stack: Python, PyTorch, Docker, Kubernetes, cloud (AWS/GCP)

Ready to Join Us?

We're excited to learn more about you. Apply now and take the next step in your career with Regenerative AI.