Automated Machine Learning is the end-to-end process of applying machine learning in automatic way.
The full autoML pipeline usually consists of:
- data pre-processing,
- feature engineering,
- feature extraction,
- feature selection,
- model training,
- algorithm selection,
- hyperparameter optimization
The outlined steps can be very time-consuming. There is a lot of ML algorithms that can be applied at each step of the analysis. The difficulty in manual construction of ML pipeline lays in the difference between data formats, interfaces and computational-intensity of ML algorithms. The Automated Machine Learning solutions aims to solve this problem by checking automatically different combinations of ML algorithms. The process of automated machine learning is controlled by statistical or machine learning algorithm.
Automated machine learning in python
The list of open source python packages available:
- auto-sklearn - python package using bayesian hyperparameter optimization with sklearn algorithms
- TPOT - python package based on genetic programming
- auto-keras - open source python package for neural networks architecture optimization (Neural Architecture Search)
- auto_ml - open source python package using Keras, xgboost, LightGBM, CatBoost
Other than python tools for automated machine learning:
AutoML research articles
The AutoML, as well as Machine Learning in general, is still subject of active research. The notable articles in AutoML field:
- Thornton C, Hutter F, Hoos HH, Leyton-Brown K (2013). Auto-WEKA: Combined Selection and Hyperparameter Optimization of Classification Algorithms. KDD ‘13 Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. pp. 847–855.
- Feurer M, Klein A, Eggensperger K, Springenberg J, Blum M, Hutter F (2015), Efficient and Robust Automated Machine Learning, Advances in Neural Information Processing Systems 28 (NIPS 2015): 2962–2970.
- Olson RS, Bartley N, Urbanowicz RJ, Moore JH (2016). Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science. Proceedings of EvoBIO 2016. Gecco ‘16. pp. 485–492
- Haifeng J, Qingquan S, Xia H (2018), Auto-Keras: Efficient Neural Architecture Search with Network Morphism, arXiv:1806.10282