This is part of the Machine Learning series.
One question that always pops up in any machine learning problem: Which algorithm should I use? What do the algorithms do anyways?
What does each algorithm do?
For this purpose, I got to interview the Channel 9 guru Seth Juarez (@sethjuarez) who happens to be as passionate about machine learning as I am:
After briefly going over a typical machine learning process, we have a closer look at third step, i.e. building the model:
What algorithms are out there? Which one should we use? What do they do? Thus, we covered the most common algorithms in machine learning problems - and on top of that we used some fancy visualisations to explain their doing:
- Perceptron (in AzureML: Two-Class Averaged Perceptron)
- Kernel perceptron (aka Support Vector Machines); in AzureML: Two-Class Support Vector Machine and Two-Class Locally-Deep Support Vector Machine
- Decision Trees
- Neural Netowrks
- Deep Learning
Machine Learning Cheat Sheet
One of Microsoft's Data Scientist, Brandon Rohrer, has written a nice three-part blog series on introducing data science with no jargon:
- What Can Data Science Do For Me? Brandon explains what prerequisites are necessary for a good start of a machine learning project.
- What Types of Questions Can Data Science Answer? Here, Brandon goes through typical questions that can be covered by the three extended algorithm families:
- Supervised learning (e.g. classification, anomaly detection, regression),
- Unsupervised learning (e.g. clustering and dimensionality reduction), and
- Reinforcement learning.
- Which Algorithm Family Can Answer My Question? Brandon gives a good overview of typical questions asked in the following areas, and which algorithm to use then:
- Predictive Maintenance
- Marketing
- Finance
- Operational Efficiency
- Energy Forecasting
- Internet of Things
- Text and Speech Processing
- Image Processing and Computer Vision
Furthermore, there is one really neat cheat sheet created by Microsoft's Data Science team on when to use which algorithm:
Top 10 data mining algorithms in plain English
Finally, one last resource that I hihgly recommend: Top 10 data mining algorithms in plain English. This article explains the 10 most influential algorithms (voted by 3 separate panels):
- C4.5 (decision tree)
- k-means (clustering)
- Support vector machines (next to C4.5, a classifier to try out first)
- Apriori (association rule learning --> recommendation engine)
- EM (i.e. expectation-maximization for clustering)
- PageRank (network analysis; think of the PageRank in Google's search engine)
- AdaBoost (boosting, and thus an ensemble learning algorithm; taking in and combining multiple learning algorithm)
- kNN (aka k-Nearest Neighbors, thus classification)
- Naive Bayes (family of classification algorithms assuming that all features is independent of each other)
- CART (aka classification and regression trees, thus a classifier)
This list contains algorithms of various algorithm families, including association rule learning (relevant for building recommenders).