In supervised learning, we are given a data set and already know what our correct output should look like, having the idea that there is a relationship between the input and the output.
Supervised learning problems are categorized into "regression" and "classification" problems. In a regression problem, we are trying to predict results within a continuous output, meaning that we are trying to map input variables to some continuous function. In a classification problem, we are instead trying to predict results in a discrete output. In other words, we are trying to map input variables into discrete categories.
Example 1:
Given data about the size of houses on the real estate market, try to predict their price. Price as a function of size is a continuous output, so this is a regression problem.
We could turn this example into a classification problem by instead making our output about whether the house "sells for more or less than the asking price." Here we are classifying the houses based on price into two discrete categories.
Example 2:
(a) Regression - Given a picture of a person, we have to predict their age on the basis of the given picture
(b) Classification - Given a patient with a tumor, we have to predict whether the tumor is malignant or benign.
在监督学习中,数据集和其对应的正确的输出是已知的我们清楚输入数据和输出结果之间是有映射关系的。
监督学习的问题可以分类为“回归问题”和“分类问题”。在回归问题中,我们尝试通过模型的继续输出来预测接下来的结果,这意味着我们试图将输入变量映射到某个连续函数上去。
在分类问题中,我们试图在离散输出中预测结果,换句话说我们试图将输入变量映射到离散的类别中。
ex1:
获得房地产市场数据,某地房价与房子面积的对应数据,试着去预测房价。房价作为以房子面积为自变量的函数的输出,所以这是一个回归问题。
当把问题变成房子的价格要高于卖价还是低于卖价时,这就是一个分类问题。在这个问题中我们将房价分为两个离散的类别。
ex2:
(a)回归-得到一张人的图像,以此为基础来预测他的年龄。
(b)分类问题-有一个肿瘤患者,我们来判断他肿瘤时恶性还是良性的。