η is called the natural parameter (also called the canonical parameter) of the distribution;
T (y) is the sufficient statistic (for the distributions we consider, it will often be the case that T (y) = y);
a(η) is the log partition function. The quantity e^(−a(η)) essentially plays the role of a normalization constant, that makes sure the distribution p(y; η) sums/integrates over y to 1.
A fixed choice of T , a and b defines a family (or set) of distributions that is parameterized by η; as we vary η, we then get different distributions within this family.
To derive a GLM for this problem, we will make the following three assumptions about the conditional distribution of y given x and about our model:
Softmax Regression
(T(y))i to denote the i-th element of the vector T (y)