Machine Learning (11) - Machine Learning Algorithms: Explained!

This is part of the Machine Learning series.

One question that always pops up in any machine learning problem: Which algorithm should I use? What do the algorithms do anyways?

What does each algorithm do?

For this purpose, I got to interview the Channel 9 guru Seth Juarez (@sethjuarez) who happens to be as passionate about machine learning as I am:

After briefly going over a typical machine learning process, we have a closer look at third step, i.e. building the model:

What algorithms are out there? Which one should we use? What do they do? Thus, we covered the most common algorithms in machine learning problems - and on top of that we used some fancy visualisations to explain their doing:

  • Perceptron (in AzureML: Two-Class Averaged Perceptron)
  • Kernel perceptron (aka Support Vector Machines); in AzureML: Two-Class Support Vector Machine and Two-Class Locally-Deep Support Vector Machine
  • Decision Trees
  • Neural Netowrks
  • Deep Learning

Machine Learning Cheat Sheet

One of Microsoft's Data Scientist, Brandon Rohrer, has written a nice three-part blog series on introducing data science with no jargon:

  1. What Can Data Science Do For Me? Brandon explains what prerequisites are necessary for a good start of a machine learning project.
  2. What Types of Questions Can Data Science Answer? Here, Brandon goes through typical questions that can be covered by the three extended algorithm families:
    • Supervised learning (e.g. classification, anomaly detection, regression),
    • Unsupervised learning (e.g. clustering and dimensionality reduction), and
    • Reinforcement learning.
  3. Which Algorithm Family Can Answer My Question? Brandon gives a good overview of typical questions asked in the following areas, and which algorithm to use then:
    • Predictive Maintenance
    • Marketing
    • Finance
    • Operational Efficiency
    • Energy Forecasting
    • Internet of Things
    • Text and Speech Processing
    • Image Processing and Computer Vision

Furthermore, there is one really neat cheat sheet created by Microsoft's Data Science team on when to use which algorithm:

Top 10 data mining algorithms in plain English

Finally, one last resource that I hihgly recommend: Top 10 data mining algorithms in plain English. This article explains the 10 most influential algorithms (voted by 3 separate panels):

  1. C4.5 (decision tree)
  2. k-means (clustering)
  3. Support vector machines (next to C4.5, a classifier to try out first)
  4. Apriori (association rule learning --> recommendation engine)
  5. EM (i.e. expectation-maximization for clustering)
  6. PageRank (network analysis; think of the PageRank in Google's search engine)
  7. AdaBoost (boosting, and thus an ensemble learning algorithm; taking in and combining multiple learning algorithm)
  8. kNN (aka k-Nearest Neighbors, thus classification)
  9. Naive Bayes (family of classification algorithms assuming that all features is independent of each other)
  10. CART (aka classification and regression trees, thus a classifier)

This list contains algorithms of various algorithm families, including association rule learning (relevant for building recommenders).

comments powered by Disqus