Introduction to scikit-learn

“I literally owe my career in the data space to scikit-learn. It’s not just a framework but a school of thought regarding predictive modeling. Super well deserved, folks :) “ Maykon Schots from Brasil

scikit-learn is;

  • Simple and efficient tools for predictive data analysis
  • Accessible to everybody, and reusable in various contexts
  • Built on NumPy, SciPy, and matplotlib
  • Open source, commercially usable - BSD license

scikit-learn is the most popular Python library for Machine Learning.

from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier(random_state=0)
X = [[ 1,  2,  3],  # 2 samples, 3 features
    [11, 12, 13]]
y = [0, 1]  # classes of each sample
clf.fit(X, y)

In the few lines of code you see above, we have done a lot of work.

scikit-learn allows you to apply a large number of ML techniques. All of these techniques can be applied through a common interface that looks much like the above code snippet.

The samples matrix (or design matrix) X whose size is typically (n_samples, n_features).

The target values y which are real numbers for regression tasks, or integers for classification (or any other discrete set of values).

For unsupervized learning tasks, y does not need to be specified.

Once the estimator (Random Forest in the code snippet above) is fitted, it can be used for predicting target values of new data.

Lets dive in with this Notebook to develop an end-to-end ML model with scikit-learn.