Dataset: MSRA A&B are introduced in this paper.
A conditional Random Field based method was proposed as
where
with K features contributing to the first term and a pairwise features being the second.
The pairwise is learning-free.
a_x is the label of pixel x indicating whether it is salient, d_(x, x') is the L2 norm of the color difference. beta is a robust parameter that weights the color contrast., where <.> is the expectation operator.
NOW let me introduce the three features used in the first term of the obove equation(E(A|I)) that are allowed for learning. The inference detail of learning process can be found in the original paper and is excluded in this blog.
1. Multi-scale contrast
where I^l is the lth-level image in the pyramid and the number of pyramid levels L is 6. N(x) is a 9*9 window. The feature map is normalized to [0,1]
2. Center-surround histogram
We measure the distance between two rectangles R(the center area) and R_s(the surrounding rectangle, with the same area of R) in RGB color space.
By varying rectangle size([0.1,0.7]*min(w,h)) and aspect ratios({0.5,0.75,1.0,1.5,2.0}), we find the most distinct rectangle R^*(x) centered at each pixel x.
Then the center-surround histogram feature f_h(x,I) is defined as a sum of spatially weighted disances:
3. Color spatial-distribution
The wider a color is distributed in the image, the less possible a salient object contains this color.
First all colors in the image are represented by GMMs, thus each pixel is assigned to a color component with a probability.
Then the horizontal and vertical variance are calculated respectively and summed up as the color variance. This variance is then used as a weight to get a weighted sum and the final spatial-variance feature is obtained.
(Pictures are alwayse pasted unsuccessfully, so please turn to the author's paper when you need the detailed equations.)