( reference : Introduction to Machine Learning in Production )
Error analysis and performance auditing
[1] Error Analysis example
Error Analysis
- tell you what’s to do to improve algorithm’s performance
Ex ) speech recognition
Error Analysis is an iterative process
-
during of error analysis, can also add additional tags!
-
go back to see if some of the other examples have added tags!
Ex ) visual inspection
finding defects in smart phones
example of tags
- specific class labels ( ex. scratch, dent … )
- image properties ( ex. blurry, dark … )
- other meta data ( ex. phone model, factory … )
Ex ) product recommendation
example of tags
- user demographics
- product features / category
Useful metrics for each tag
- what % of errors has that tag?
- of data with that tag, what % is misclassified?
- what % of all data has that tag?
- how much room for improvement is there on the data with that tag?
[2] Prioritizing what to work on
right most column : contribution to raising average accuracy
Which category to focus on?
- 1) how much room for improvement?
- 2) how frequently that category appears?
- 3) how easy to improve accuracy?
After choosing which category to focus on….
- 1) collect more data!
- 2) data augmentation
- 3) improve label accuracy / data quality
[3] Skewed datasets
Skip
- accuracy / precision / recall / F1 score …
[4] Performance auditing
even though well on accuracy/F1 score….
“performance audit” before pushing it to production!
\(\rightarrow\) might save you from significant post deployment problems
Double check your system!
( accuracy, fairness/bias, etc … )
- step 1) brainstorm the ways the system might go wrong
- performance on “subsets of data”
- ex) gender, age, ethnicity..
- how common are certain errors
- ex) FP, FN
- performance on rare cases
- performance on “subsets of data”
- step 2) establish metrics to assess the performance of those issues
- performance on slices of the data ( not on entire dev set )
- after establishing metrics… MLOps can help automatic evaluation!
- ex) TFMA ( Tensorflow model analysis )
- step 3) buy-in from the business of the product owner
Example ) speech recongition
- step 1) brainstorm the ways the system might go wrong
- ex) accuracy on different genders/ethnicities
- ex) accuracy on different device
- ex) prevalence of rude mis-transcripition
- GAN (generative adversarial network) \(\rightarrow\) gang? gun?
- step 2) establish metrics to assess the performance of those issues
- ex) mean accuracy on different genders/ethnicities
- ex) mean accuracy on different device
- ex) checking rude words