Caltech 101 图像数据库
BBCSport
Consists of 737 documents from the BBC Sport website corresponding to sports news articles in five topical areas (athletics, cricket, football, rugby, tennis) from 2004-2005.
The Yale Face Database (size 6.4MB) contains 165 grayscale images in GIF format of 15 individuals. There are 11 images per subject, one per different facial expression or configuration: center-light, w/glasses, happy, left-light, w/no glasses, normal, right-light, sad, sleepy, surprised, and wink.
Extended Yale B人脸数据库
ORL dataset
ORL face database composed of 400 images of size 112 x 92. There are 40 persons, 10 images per each person. The images were taken at different times, lighting and facial expressions.
USPS digits
The USPS handwritten digit database. We provide here a popular subset contains 9298 16x16 handwritten digit images in total, which is then split into 7291 training images and 2007 test images.
UCI digits
This dataset consists of features of handwritten numerals (`0'--`9'). It is composed of 2000 data points.
Multi-View Twitter Datasets
A collection of Twitter datasets for evaluating multi-view analysis methods.
3Sources Dataset
A multi-view text corpus, constructed from news articles from three online news services (BBC, Reuters, and The Guardian).
Synthetic Multi-view text Datasets
a set of synthetic multi-view text datasets, constructed from the single-view BBC and BBCSport corpora by splitting news articles into related segments of text.
MSRC v1 dataset
The MSRC v1 dataset from Microsoft Research in Cambridge contains 240 images and 9 object classes with coarse pixel-wise labeled images.
Scene-15 Dataset
The initial 8 classes were collected by Oliva and Torralba, and then 5 categories were added by Fei-Fei and Perona; finally, 2 additional categories were introduced by Lazebnik et al.