( reference : Machine Learning Data Lifecycle in Production )
Introduction to Machine Learning Engineering in Production
[1] Overview
- ML enginerring for PRODUCTION
- Production ML = (1) + (2)
- (1) ML development
- (2) software development
- Challenges in production ML
Traditional ML vs Producton ML
Main difference :
- production ML requires much more than just a modeling code!!
- data is NOT STATIC in production ML!!
[ Traditional ML ]
[ Production ML ]
Manage the entire life cycle of data
- labeling
- is it properly labeled?
- feature space coverage
- do they always have the same feature space?
- minimal dimensionality
- reduce the dimension of feature to optimize performance
- maximum predictive data
- does the data have predictive information?
Production ML system
- continuosly moniter the model performance, ingest new data, retrain when needed, redeploy to maintain / improve the performance
Challenges in production grade ML
- have to build an INTEGRATED ML system
- need to CONTINUOSLY operate it in production
- handle CONTINUOSLY CHANGING DATA
- optimimze compute resource costs
[2] ML Pipelines
Outline
- ML Pipelines
- DAG (Directed Acyclic Graphs) & Pipeline Orchestration Frameworks
- TFX ( Tensorflow Extended )
ML pipeline
DAG ( Directed Acyclic Graphs )
- directed graphs with NO cycles
- ML pipeline workflows : usually DAGs
- sqeuencing of tasks
- have relationships/dependencies with each other
Pipeline Orchestration Frameworks
- GOAL : schedule components in ML pipelines
- make pipeline automation
- ex) Airflow, Argo, Celery, Luigi, Kubeflow
TFX ( Tensorflow Extended, TFX )
-
end-to-end platform,
for deploying production ML pipelines