Scikit-learn is a python library that is used for data mining and data analysis. It is build on top of NumPy, SciPy and Matplotlib python libraries and frameworks which is being widely used in the scientific research community. Scikit was initially released in June 2007 and has over 11 years of stable development. It is one of the preferred machine learning libraries in the market today.
There are multiple scikits, which are scientific toolboxes built around SciPy. Apart from scikit-learn, another popular one is scikit-image.
A complete list of scikit family of libraries can be found here https://scikits.appspot.com/scikits
What all things you can do with scikit-learn in Machine Learning?
Scikit implements almost all the commonly used algorithms in machine learning in its libraries. Scikit can be used for following set of machine learning problems.
Problem Type | Description | Scenario | Algorithm Modules |
Classification
| Identifying to which category an object belongs to. | Spam detection, Image recognition. | SVM, nearest neighbors, random forest |
Regression
| Predicting a continuous-valued attribute associated with an object. | Drug response, Stock prices. | SVR, ridge regression, Lasso |
Clustering
| Automatic grouping of similar objects into sets. | Customer segmentation, Grouping experiment outcomes | k-Means, spectral clustering, mean-shift, |
Dimensionality reduction
| Reducing the number of random variables to consider. | Visualization, Increased efficiency | PCA, feature selection, non-negative matrix factorization. |
Model selection
| Comparing, validating and choosing parameters and models. | Improved accuracy via parameter tuning | grid search, cross validation, metrics. |
Preprocessing
| Feature extraction and normalization. | Transforming input data such as text for use with machine learning algorithms. | preprocessing, feature extraction. |
How to use scikit-learn ?
Scikit is easy to install if you have a working python installation on your computer. Scikit is depended on numpy, scipy and mathplot so those libraries has to be installed first.
All you need to is to
ff you already have a working installation of numpy and scipy, the easiest way to install scikit-learn is using pip
pip install numpy pip install scipy pip install matplotlib pip install scikit-learn