[Week 1-1] Introduction to Machine Learning Engineering in Production

(coursera) Machine Learning Data Lifecycle in Production - Collecting, Labeling, Validating Data

1 minute read

Seunghan Lee

Seunghan Lee

Deep Learning, Data Science, Statistics

( reference : Machine Learning Data Lifecycle in Production )

Introduction to Machine Learning Engineering in Production

[1] Overview

ML enginerring for PRODUCTION
Production ML = (1) + (2)
- (1) ML development
- (2) software development
Challenges in production ML

Traditional ML vs Producton ML

Main difference :

production ML requires much more than just a modeling code!!
data is NOT STATIC in production ML!!

[ Traditional ML ]

[ Production ML ]

Manage the entire life cycle of data

labeling
- is it properly labeled?
feature space coverage
- do they always have the same feature space?
minimal dimensionality
- reduce the dimension of feature to optimize performance
maximum predictive data
- does the data have predictive information?

Production ML system

continuosly moniter the model performance, ingest new data, retrain when needed, redeploy to maintain / improve the performance

Challenges in production grade ML

have to build an INTEGRATED ML system
need to CONTINUOSLY operate it in production
handle CONTINUOSLY CHANGING DATA
optimimze compute resource costs

[2] ML Pipelines

Outline

ML Pipelines
DAG (Directed Acyclic Graphs) & Pipeline Orchestration Frameworks
TFX ( Tensorflow Extended )

ML pipeline

DAG ( Directed Acyclic Graphs )

directed graphs with NO cycles
ML pipeline workflows : usually DAGs
- sqeuencing of tasks
- have relationships/dependencies with each other

Pipeline Orchestration Frameworks

GOAL : schedule components in ML pipelines
make pipeline automation
ex) Airflow, Argo, Celery, Luigi, Kubeflow

TFX ( Tensorflow Extended, TFX )

end-to-end platform,

for deploying production ML pipelines

Twitter Facebook LinkedIn

You May Also Enjoy

8 minute read

2 minute read

5 minute read

14 minute read