My paper on an alternative Matting Laplacian has been recently accepted at ICIP 2016. I have uploaded the preprint to arXiv.org and the reference code is also available on my github repository:
Say we want to cut out the trolls from the background. We need to
extract an opacity mask of the foreground. This mask is called the
$\alpha$ matte. When working in the linear RGB colour space, the
observed colour $C_i$ at pixel $i$ is a blend between the foreground
colour $F_i$ and the background $B_i$:
We need to solve for $F_i$, $B_i$ and $\alpha_i$. This is
unfortunately non-linear and over parameterised. In their remarkable
work, Levin et al. propose that the transparency values can be
approximated as a linear combination of the colour components:
with $a_i = [a_i^R, a_i^G, a_i^B]$. We can think of $a$ and $b$ as a
colour filter that we apply on the picture to obtain an intensity map
of $\alpha$. In a way, $a$ is the colour through which $\alpha$
is revealed. Below is as example with $\alpha=2.2 C^R -1.36 C^G - 0.41
C^B -0.04$:
This map is a good approximation of $\alpha$ for the top left of the
picture but not for the rest of the picture. Ideally then we would map
a model to each pixel of the image. The problem is that we would end
up with too many unknown parameters. Levin et al. solves this problem
by introducing a spatial constraint that makes the problem tracktable
and yields a global closed-form solution. They state that each local
matting model should fit a local $3\times3$ image patch $w_i$ that
overlaps with its neighbourhood:
This is a quadratic expression in $a$, $b$ and $\alpha$. Levin et
al. show that a closed-form solution for $\alpha$ can be found without
having to explicitly compute the model parameters $a_i$ and $b_i$:
This leads to a sparse linear system $L \alpha = 0$, which can be
solved using iterative solvers. It then is easy to add constraints to
the values of $\alpha$. Below is a result based on this method:
Now, let us look at what the implicitly computed model parameters look like. The image below is a map of $a/5 + 0.5$:
Recall that $a$ is in fact the colour through which we can reveal
$\alpha$, thus what we see here are the colours (up to some brightness
and contrast) of the local colour filter that is used to reveal
$\alpha$ . What is striking is that $a$ shows some smoothness and we
should try to exploit it. Unfortunately this is hard to do with Levin
et al. as the model paremeters are not exposed in the equations.
Our contribution is to take the opposite approach of Levin et al. and
explicitly solve for the model parameters instead:
We then show that a closed-form solution can also be found for $a$ and $b$. Interestingly the problem turns out to be an anisotropic diffusion of $a$ and $b$:
This also leads to a sparse linear system $A [\begin{array}{cc}
a & b \end{array}]^T = 0$. But since we have an explicit
representation of the model parameters, it is now easier to add
further smoothness priors to the model parameters. For instance below
is the result of increasing the spatial smoothness of $a$:
The resulting transparency map still shows little difference with Levin et al.:
The advantages of this approach are
Computational. Our equations are simpler than in Levin et al..
Modelling. It is easier to set meaningful priors on $a$ and $b$.
My paper on an alternative Matting Laplacian has been recently accepted at ICIP 2016. I have uploaded the preprint to arXiv.org and the reference code is also available on my github repository:
The first movement of the spatial audio 360 VR piece From Within,
From Without by Enda Bates is now on
YouTube. Enda has published a very
detailed
post
about the project and immersive spatial audio. The music was performed
by Pedro López López, Trinity Orchestra, and Cue Saxophone Quartet in
Trinity College Dublin on April 8th, 2016 (see previous my previous
post and Enda’s
blog).
At the moment the best way to watch it is with Google’s Cardboard VR
headset and an Android phone as only YouTube’s Android app supports
spatial audio.
This is the first piece in a series of immersive spatial audio
experiences that we plan to record in Trinity College Dublin.
At the moment we’ve simply stitched the 12 GoPros from our 360 rig
using off the shelf software (VideoStitch 2). There are still some
visible artefacts that are mainly due to parallax. We’ll get our own
video processing algorithms working at some point and try to improve
on this.
It has been a few weeks since we’ve recorded Trinity360’s event and we’ve started rendering a 360
video with spatial audio.
Thanks to the audio team of Prof Boland from the
Sigmedia research group of Trinity College
Dublin, Google has brought spatial audio support to Google Cardboard’s
virtual reality system (see Google Developers
Blog). So now we can experience spatial audio
on YouTube!
I’ve detailed below the ffmpeg commands we used
to preview our 360 videos with spatial audio on the Jump Inspector and
then for uploading to YouTube.
1. Encoding for the Jump Inspector (Preview)
As full processing of the spatial audio by YouTube takes a bit of
time, it was very useful to quickly preview our videos on an Android
phone using the Jump Inspector
App. The Jump
Inspector requires videos to be in a specific format that is detailed
here.
1.1. Video encoding for the Jump Inspector
Our stitched 360-mono video is named
trinity360-stitched.video.mov. Jump Inspector requires us to target
a video stream with the following specs:
Now, it is important for the Jump Inspector that the file ends with
.360.mono.mp4.
1.2. Audio encoding for the Jump Inspector
Our Ambisonics are a 4 channel wav file (44.1kHz, 16bit) in the ACN
SN3D Ambisonics format specified by YouTube. To work with Jump
Inspector, we converted these for to aac 128k as follows:
Then we just transferred our file to the Jump directory of our Nexus 5.
2. Encoding for YouTube
The video specs requirements are less stringent for YouTube. There is
no requirement of video resolution or audio compression besides having
the Ambisonics as a 4 channels in the ACN SN3D Ambisonics format and
setting the metadata as described on the YouTube
help.
2.1. ffmpeg Encoding
We kept the audio as uncompressed PCM s16 (pcm_s16le). It is
supported in MOV containers, but not in MP4. The command is thus
simply:
We’ve downloaded Google’s 360 Video Metadata app 360 Video Metadata
app and selected
spherical and Spatial Audio:
2.3. Upload to YouTube
Then the video was uploaded to YouTube. Nothing special needs to be done here, you just have to wait for a couple of hours for the spatial audio to be fully processed, so be patient.
Edit: I have changed the post to clearly separate the instructions for YouTube and for the Jump Inspector.
For completeness, this is the ffmpeg version we’re using:
ffmpeg version 2.8.6 Copyright (c) 2000-2016 the FFmpeg developers
built with Apple LLVM version 7.0.2 (clang-700.1.81)
configuration: --prefix=/opt/local --enable-swscale --enable-avfilter --enable-avresample --enable-libmp3lame --enable-libvorbis --enable-libopus --enable-libtheora --enable-libschroedinger --enable-libopenjpeg --enable-libmodplug --enable-libvpx --enable-libsoxr --enable-libspeex --enable-libass --enable-libbluray --enable-lzma --enable-gnutls --enable-fontconfig --enable-libfreetype --enable-libfribidi --disable-indev=jack --disable-outdev=xv --mandir=/opt/local/share/man --enable-shared --enable-pthreads --cc=/usr/bin/clang --enable-vda --enable-videotoolbox --arch=x86_64 --enable-yasm --enable-gpl --enable-postproc --enable-libx264 --enable-libxvid
Trinity College Dublin composer and teaching fellow in the Music and Media Technology Programme, Enda Bates, composed a multi-movement spatial music work, entitled From Within, From Without. The piece was performed in Trinity’s Exam Hall on the 8th of April as part of Trinity’s Creative Challenge Showcase.
The concert comprised of an acoustic, electroacoustic, and electronic spatial music and was filmed using 360˚ cameras and microphones for Virtual Reality (VR) presentation. We are currently working on the video side of the VR capture. On this occasion, we’ve designed a compact home brew 12 GoPro’s stereo 360 Rig (timelapse below).
You can follow Enda’s blog for more information about the project.
Today was the opening day for the ADAPT SFI centre. I was demonstrating for Sigmedia some of our 3D technology for creative artists. There was a bit of news coverage on RTÉ news.
We have just finished working on a stereo 3D short movie with Steve Woods. This is a 2D version of a 3D stereoscoptic Dance film, performed by dancers Michelle Boulé and Philip Connaughton and choreographed by John Scott from the Irish Modern Dance Theatre.