Many image-editing and film post-production applications rely on natural image matting as one of the processing steps. The task of the matting algorithm is to estimate the opacity of a foreground object in an image or video sequence accurately. Researchers from Trinity College Dublin propose the AlphaGAN **** architecture for natural image matting.
In mathematical terms, every pixel
in the image is assumed to be a linear combination of the foreground and background colors:
is a scalar value that defines the foreground opacity at pixel
and is referred to as the alpha value.
So, how to solve this equation with so many unknown values? Let’s first discover the current
to solving this problem…
Lots of current algorithms aim to solve the matting equation by treating it as a color-problem following either
Sample-based image matting
assumes that true foreground and background colors of the unknown pixel can be derived from the known foreground and background samples that are near that pixel. Methods that follow this assumption include:
Propagation image matting
works by propagating the known alpha value between known local foreground and background samples to the unknown pixels. The examples include:
fuzzy connectedness matting
However, the over-dependency on color information can lead to artifacts in images where the foreground and background color distributions overlap.
Thus, recently, several
deep learning approaches
to the natural image matting were introduced, including:
a two-stage network consisting of an encoder-decoder stage and a refinement stage by
Xu et al.
for deep automatic portrait matting by
Shen et al.
end-to-end CNN that utilizes the results deduced from local and non-local matting algorithms by
Cho et al.
granular deep learning (GDL) architecture by
Hu et al
But is it possible to improve further the performance of these algorithms by applying GANs? Let’s find out now!
Lutz, Amplianitis, and Smolić from Trinity College Dublin are the first to
propose generative adversarial network (GAN) for natural image matting
. Their generator network is trained to predict visually appealing alphas, while the discriminator is trained to classify well-composited images.
The researchers build their approach by improving the network architecture of
Xu et al
. to better deal with the spatial localization issues inherent in CNNs. In particular, they use dilated convolutions to capture global context information without downscaling feature maps and losing spatial information.
We are now ready to move on to the details of AlphaGAN — this is how Lutz and his colleagues call their image matting algorithm.
architecture consists of one generator G and one discriminator D.
is a convolutional encoder-decoder network that is trained both with the help of the ground-truth alphas as well as the adversarial loss from the discriminator. It takes an image composited from the foreground, alpha and a random background appended with the trimap as 4th-channel as input and attempts to predict the correct alpha. Resnet50 architecture is used for the encoder.
As you can see from the figure below, the decoder part of the network includes skipping connections from the encoder to improve the alpha prediction by reusing local information to capture fine structures in the image.
The generator of the AlphaGAN
tries to distinguish between real 4-channel inputs and fake inputs where the first three channels are composited from the foreground, background and the predicted alpha. PatchGAN introduced by
Isola et al.
is used for the discriminator in this network.
The full objective of the network includes alpha-prediction loss, compositional loss, and adversarial loss:
The proposed method was evaluated based on two datasets:
dataset, which includes 1000 test images composed of 50 unique foreground objects;
which consists of 28 training images and 8 test images; for each set, three different sizes of trimaps are provided, namely, “small” (S), “large” (L) and “user” (U).
The Composition-1k Dataset
The metrics used include
the sum of absolute differences
mean square error
The researchers compared their method with several state-of-the-art approaches where there is public code available. For all methods, the original code from the authors was used, without any modifications.
Quantitative results on the Composition-1k dataset. AlphaGAN’s results are shown in parenthesis
As it can be observed from the table, AlphaGAN delivers noticeably better results than other image matting algorithms selected for comparison. There is only one case (gradient error from the comprehensive sampling approach), where they do not achieve the best result.
See also some qualitative results of this comparison in the next set of pictures:
The Alphamatting.com Dataset
The researchers submitted results generated by AlphaGAN to
benchmark and got to the top positions for some of the images:
Alpha matting predictions for the “Troll” and “Doll” images (best results) and the “Net” image (worst result) taken from the alphamatting.com dataset. From left to right: DCNN , IF , DI , ‘Ours’
Specifically, they achieved the best results for
images, and the first place overall on the gradient evaluation metric. Their high results on these particular images demonstrate the advantage of using the adversarial loss from the discriminator to correctly predict the alpha values for such fine structures as hair.
The worst results of the proposed method come from the
image. However, even though AlphaGAN approach appears low in the rankings for this image, the results still look very close to the top-performing approaches:
AlphaGAN is the first algorithm that uses GANs for natural image matting. Its generator is trained to predict alpha mattes from input images while the discriminator is trained to distinguish good images composited from the ground-truth alpha from images composited with the predicted alpha.
Such network architecture produces visually appealing compositions with state-of-the-art or comparable results for the primary metrics. Exceptional performance is achieved for the images with such fine structures as hair. That is of great importance in practical matting applications, including film and TV production.
Originally published at
on September 11, 2018.