Introduction to scikit-learn

“I literally owe my career in the data space to scikit-learn. It’s not just a framework but a school of thought regarding predictive modeling. Super well deserved, folks :) “ Maykon Schots from Brasil

scikit-learn is;

Simple and efficient tools for predictive data analysis
Accessible to everybody, and reusable in various contexts
Built on NumPy, SciPy, and matplotlib
Open source, commercially usable - BSD license

scikit-learn is the most popular Python library for Machine Learning.


from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier(random_state=0)
X = [[ 1,  2,  3],  # 2 samples, 3 features
    [11, 12, 13]]
y = [0, 1]  # classes of each sample
clf.fit(X, y)

In the few lines of code you see above, we have done a lot of work.

scikit-learn allows you to apply a large number of ML techniques. All of these techniques can be applied through a common interface that looks much like the above code snippet.

The samples matrix (or design matrix) X whose size is typically (n_samples, n_features).

The target values y which are real numbers for regression tasks, or integers for classification (or any other discrete set of values).

For unsupervized learning tasks, y does not need to be specified.

Once the estimator (Random Forest in the code snippet above) is fitted, it can be used for predicting target values of new data.

Lets dive in with this Notebook to develop an end-to-end ML model with scikit-learn.

Data Analytics and Machine Learning using Python - A Crash Course by Harsh Singhal

Introduction to scikit-learn