Stage M2 : Neural Network compression by optimising weight quantisation

When:
31/12/2022 – 01/01/2023 all-day
2022-12-31T01:00:00+01:00
2023-01-01T01:00:00+01:00

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : LIRIS
Durée : 5-6 mois
Contact : stefan.duffner@liris.cnrs.fr
Date limite de publication : 2022-12-31

Contexte :
Deep Neural Networks (DNN) are powerful machine learning models for a large number of applications. However, they may have an enormous number of parameters and require large amounts of memory and computational resources and thus incur a high energy consumption, which makes their use for edge computing difficult.
Several approaches have been proposed to alleviate this problem, e.g. pruning, quantisation or architectural optimisations such as Neural Architecture Search. Although more and more efficient solutions exist also on the practical side (TensorFlow Lite, PyTorch quantization [1], NVIDIA Tensor RT etc.), the deployment of large DNN on embedded systems is still challenging.
Thus, on a more global level, a major concern in reducing the energy consumption related to AI in the cloud as well as on the edge is to make these tools more efficient and more accessible to a larger public.

[1] https://pytorch.org/blog/introduction-to-quantization-on-pytorch/
[2] Renato Cintra, Stefan Duffner, Christophe Garcia & André Leite (2018). « Low-complexity Approximate Convolutional Neural Networks ». IEEE Transactions on Neural Networks and Learning Systems

Sujet :
The goal of this project is to study the state of the art in neural network quantisation and experiment with existing frameworks such as the PyTorch quantisation module. We will particularly focus on post-training static quantisation. The first objective is to implement a simple pipeline (either using one of the existing libraries or from scratch) and make it extensible and adaptable to new algorithms. A set of standard models (MLP and CNN) and some common datasets will serve as a test bench.
A second objective consists in developing and experimenting with new quantisation schemes (fixed-point and floating-point of different precision and different layer-wise/channel-wise strategies).
Finally, a more complex quantisation algorithm that we published earlier [2] should be implemented and adapted to the given framework and pipeline. The developed algorithms should be tested and evaluated (on CPU and GPU hardware).
This internship is part of an industrial exploitation project of research work in collaboration with engineers from Pulsalys (https://www.pulsalys.fr).

Profil du candidat :
Master in Computer Science, AI, machine learning or similar, or final year of engineering school

Formation et compétences requises :
– Good knowledge in machine learning and neural networks
– Knowledge in optimisation is a plus
– Good skills in python programming and Pytorch, scipy, numpy etc.
– Scientific curiosity and creativity
– Motivated to work in a team of researchers and engineers

Adresse d’emploi :
LIRIS – INSA Lyon, 7 Avenue Jean Capelle, 69621 Villeurbanne, France