Interpretability models for fault identification and diagnosis in connected manufacturing

When:

31/01/2019 – 01/02/2019 all-day

2019-01-31T01:00:00+01:00

2019-02-01T01:00:00+01:00

Annonce en lien avec l’Action/le Réseau : aucun

Laboratoire/Entreprise : Laboratoire LADIS du CEA LIST et CEDRIC -Cnam
Durée : 6 mois
Contact : pierre.blanchart@cea.fr
Date limite de publication : 2019-01-31

Contexte :
Modern factories operation and optimization rely on fine-grained monitoring of machines and products. Besides classical purposes such as energy optimization and smart production planning, there is a high demand for systems able to detect and isolate the location of faults occurring in production chains. Thus, there has been a tremendous effort to design computational intelligences able to represent the underlying dynamics of such complex systems, with the goal of detecting, identifying and possibly explaining the occurrence of faults while the system is in operation.

Sujet :
Within the teams of the CEA/LADIS, we have been investigating fault detection models working on a global set of engineered features extracted from sensor measurements at the workstations level. We deployed such models on several real life datasets, coming both from our projects partners and from fault detection challenges in which we participated. More recently, we have been looking into making the decision of those models interpretable, without impacting the performance of the original fault detection models. The purpose is to answer the following questions : ”Is there a fault ?”, ”where/when did it happen ?”, ”why did it happen ?”. While the first question is answered by the fault detection model itself, the two others cannot be answered without explaining/interpreting the decision taken by this model.
In this internship, we propose to build on the work realized in our teams to add interpretability to a specific class of models known as gradient boosted trees [1] that were used as fault detection models. Since they are decision tree-based models, they keep some interpretability in the sense that they analyze individual features sequentially, without any non-linear transformation of the original feature space. But, the trained models are nevertheless too heavy to be analyzed directly by a human operator. The expected task would thus be to design machine learning models that learn to interpret forests/tree based fault detection models learned on massive data and large feature spaces, and produce a human readable diagnosis related to a fault occurrence.
The data as well as the fault detection models would be provided to the candidate. Preliminary works regarding interpretability (including development code) have been performed [2], which would serve as a basis to start the internship. In particular, recurrent neural network-based sequential models [3] analyzing paths inside decision trees have been investigated as
a possible solution. The internship would be axed on investigating similar models, and, as such, is more leaned towards research than development.

[1] Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’16). ACM, New York, NY, USA, 785-794. DOI: https://doi.org/10.1145/2939672.2939785
[2] Blanchart P., Gouy-Pailler C. (2017) WHODID: Web-Based Interface for Human-Assisted Factory Operations in Fault Detection, Identification and Diagnosis. In: Altun Y. et al. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2017. Lecture Notes in Computer Science, vol 10536. Springer.
[3] Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Comput. 9, 8 (November 1997), 1735-1780.

Profil du candidat :
The candidate should have a background in machine learning / deep learning and a general background in data/statistical analysis. Programming skills in a usual prototype language such as R, Matlab or Python are required as well. The internship is proposed for candidates following a master of science program.

Formation et compétences requises :
The candidate should have a background in machine learning / deep learning and a general background in data/statistical analysis. Programming skills in a usual prototype language such as R, Matlab or Python are required as well. The internship is proposed for candidates following a master of science program.

Adresse d’emploi :
The internship is to take place in the Laboratoire LADIS of the CEA LIST, located on the campus of Saclay, and will be co-supervised by Marin Ferecatu and Michel Crucianu from the VERTIGO Team of the CEDRIC – Conservatoire National des Arts et Métiers (CNAM). The internship is to last 5-6 months and is intended for master of science students in their second year. To apply, please send your candidature via email (curriculum + short cover letter) to pierre.blanchart@cea.fr and michel.crucianu@cnam.fr

Document attaché : stage_manufacturing.pdf

MaDICS

Masses de Données, Informations et Connaissances en Sciences

Big Data - Data Science

Interpretability models for fault identification and diagnosis in connected manufacturing