scikit-learn: machine learning in Python — scikit-learn 0.12.1 documentation
scikit-learn: machine learning in Python
banner3banner13banner6banner14
Easy-to-use and general-purpose machine learning in Python
scikit-learn is a Python module integrating classic machine learning algorithms in the tightly-knit scientific Python world (numpy, scipy, matplotlib). It aims to provide simple and efficient solutions to learning problems, accessible to everybody and reusable in various contexts: machine-learning as a versatile tool for science and engineering.
Supervised learning
Support vector machines, linear models, naive Bayes, Gaussian processes...
Unsupervised learning
Clustering, Gaussian mixture models, manifold learning, matrix factorization, covariance...
And much more
Model selection, datasets, feature extraction... See below.
License: Open source, commercially usable: BSD license (3 clause)
Documentation for scikit-learn version 0.12.1. For other versions and printable format, see Documentation resources.
User Guide¶
1. Installing scikit-learn
1.1. Installing an official release
1.2. Third party distributions of scikit-learn
1.3. Bleeding Edge
1.4. Testing
2. Tutorials: From the bottom up with scikit-learn
2.1. An Introduction to machine learning with scikit-learn
2.2. A tutorial on statistical-learning for scientific data processing
3. Supervised learning
3.1. Generalized Linear Models
3.2. Support Vector Machines
3.3. Stochastic Gradient Descent
3.4. Nearest Neighbors
3.5. Gaussian Processes
3.6. Partial Least Squares
3.7. Naive Bayes
3.8. Decision Trees
3.9. Ensemble methods
3.10. Multiclass and multilabel algorithms
3.11. Feature selection
3.12. Semi-Supervised
3.13. Linear and Quadratic Discriminant Analysis
4. Unsupervised learning
4.1. Gaussian mixture models
4.2. Manifold learning
4.3. Clustering
4.4. Decomposing signals in components (matrix factorization problems)
4.5. Covariance estimation
4.6. Novelty and Outlier Detection
4.7. Hidden Markov Models
5. Model Selection
5.1. Cross-Validation: evaluating estimator performance
5.2. Grid Search: setting estimator parameters
5.3. Pipeline: chaining estimators
6. Dataset transformations
6.1. Preprocessing data
6.2. Feature extraction
6.3. Kernel Approximation
7. Dataset loading utilities
7.1. General dataset API
7.2. Toy datasets
7.3. Sample images
7.4. Sample generators
7.5. Datasets in svmlight / libsvm format
7.6. The Olivetti faces dataset
7.7. The 20 newsgroups text dataset
7.8. Downloading datasets from the mldata.org repository
7.9. The Labeled Faces in the Wild face recognition dataset
8. Reference
8.1. sklearn.cluster: Clustering
8.2. sklearn.covariance: Covariance Estimators
8.3. sklearn.cross_validation: Cross Validation
8.4. sklearn.datasets: Datasets
8.5. sklearn.decomposition: Matrix Decomposition
8.6. sklearn.ensemble: Ensemble Methods
8.7. sklearn.feature_extraction: Feature Extraction
8.8. sklearn.feature_selection: Feature Selection
8.9. sklearn.gaussian_process: Gaussian Processes
8.10. sklearn.grid_search: Grid Search
8.11. sklearn.hmm: Hidden Markov Models
8.12. sklearn.kernel_approximation Kernel Approximation
8.13. sklearn.semi_supervised Semi-Supervised Learning
8.14. sklearn.lda: Linear Discriminant Analysis
8.15. sklearn.linear_model: Generalized Linear Models
8.16. sklearn.manifold: Manifold Learning
8.17. sklearn.metrics: Metrics
8.18. sklearn.mixture: Gaussian Mixture Models
8.19. sklearn.multiclass: Multiclass and multilabel classification
8.20. sklearn.naive_bayes: Naive Bayes
8.21. sklearn.neighbors: Nearest Neighbors
8.22. sklearn.pls: Partial Least Squares
8.23. sklearn.pipeline: Pipeline
8.24. sklearn.preprocessing: Preprocessing and Normalization
8.25. sklearn.qda: Quadratic Discriminant Analysis
8.26. sklearn.svm: Support Vector Machines
8.27. sklearn.tree: Decision Trees
8.28. sklearn.utils: Utilities
Example Gallery¶
Examples
General examples
Examples based on real world datasets
Clustering
Covariance estimation
Dataset examples
Decomposition
Ensemble methods
Tutorial exercises
Gaussian Process for Machine Learning
Generalized Linear Models
Manifold learning
Gaussian Mixture Models
Nearest Neighbors
Semi Supervised Classification
Support Vector Machines
Decision Trees
Development¶
Contributing
Submitting a bug report
Retrieving the latest code
Contributing code
Other ways to contribute
Coding guidelines
APIs of scikit-learn objects
How to optimize for speed
Python, Cython or C/C++?
Profiling Python code
Memory usage profiling
Performance tips for the Cython developer
Profiling compiled extensions
Multi-core parallelism using joblib.Parallel
A sample algorithmic trick: warm restarts for cross validation
Utilities for Developers
Validation Tools
Efficient Linear Algebra & Array Operations
Efficient Routines for Sparse Matrices
Graph Routines
Backports
Testing Functions
Helper Functions
Hash Functions
Warnings and Exceptions
Developers’ Tips for Debugging
Memory errors: debugging Cython with valgrind
About us
History
People
Citing scikit-learn
Funding