Monocular Reconstruction of 4D Human Models from Video

When:
15/02/2021 – 16/02/2021 all-day
2021-02-15T01:00:00+01:00
2021-02-16T01:00:00+01:00

Offre en lien avec l’Action/le Réseau : DOING/Doctorants

Laboratoire/Entreprise : ICube
Durée : 6 mois
Contact : seo@unistra.fr
Date limite de publication : 2021-02-15

Contexte :
The robust three-dimensional reconstruction of face and body from one or more images has been an open problem for decades, with many exciting application areas. Initially, efforts were focused on facial reconstruction, and later evolved into the reconstruction of body. A common way to capture such models is to use calibrated multi-view passive cameras to merge a sparse or dense set reconstructed depth images into a single mesh, but size and cost of such multi-view systems prevent their use in consumer applications.
In more unconstrained and ambiguous settings, such as in the monocular image or video, priors in the form of template model or parametric model are often used, which help to constrain the problem significantly. While generative methods reconstruct the moving geometry by optimizing the alignment between the projected model and the image data, regressive methods train deep neural networks to infer shape parameters of a parametric body model from a single image. Despite remarkable progress, reconstruction of 4D humans, i.e. space-time coherent 3D models has not been fully addressed yet, with most existing algorithms operating in a frame-by-frame manner.
In this internship, we will focus on the reconstruction of space-time coherent deforming geometry of entire human body from video input. The problem is particularly challenging since such 4D data is typically of high dimension both spatially and temporally. We will approach the problem by combining a parametric model such as SMPL with recent deep learning techniques that learn to predict both the shape and the motion of the human body in its parametric space.

Sujet :
Our work will be inspired by recent progress on deep autoencoders that approximate an identity mapping by coupling an encoding stage with a decoding stage to learn a compact latent representation of reduced dimensionality. With its appealing characteristic that these are unsupervised, i.e. no labeled data is required, autoencoders have been used to tackle a wide range of tasks, including face recognition, real-time 2D-to-3D alignment, and face model reconstruction.
The main objective is to develop a novel, model-based autoencoder that will learn to jointly regress a set of model parameters (identity shape, pose-dependent shape) based on a skinned template, as well as camera parameters to the foreground segmented from the input video. Among others, SMPL representation is considered as our model: the body model is parameterized by the pose vector θ and shape vector β, with a template mesh M whose pose-dependent deformation is computed using a linear blend skinning function. It will further include camera parameters, the orientation T∈SO(3) and the position t∈ R^3 of the camera. More specifically, the 3D body M(β,θ) will be rendered using a full perspective projection ∏: R^3→R^2 that maps from camera space to screen space. To enable training, implementing a backward pass may be required, i.e. the computation of the gradients of the projection function with respect to the parameters. Developing a robust loss function that includes spatiotemporal regularization along with the data error will also be an important part of this work. Evaluation and comparison of the performance to the state-of-the-art methods is strongly recommended, whenever applicable.

Profil du candidat :
— Master student in Computer Science or in (Applied) Mathematics
— Solid programming skills in deep learning platforms: Tensorflow/Pytorch
— Background in geometric modeling and statistics
— Good communication skills

Formation et compétences requises :
Image processing, Introduction to deep learning, Computer vision, Linear algebra

Adresse d’emploi :
ICube UMR 7357 – Laboratoire des sciences de l’ingénieur, de l’informatique et de l’imagerie
300 bd Sébastien Brant – CS 10413 – F-67412 Illkirch

Document attaché : 202012142140_SujetM2_Reconstruction_from_Video.pdf