10. Automatic recognition and generation of Cued Speech using deep learning

Beneficiary: Centre National de la Recherche Scientifique (GIPSA-lab), France

In the technology domain, the accessibility to communication tools for people with sensory disabilities is a priority. Relay Services dedicated to people with hearing impairment have been created. These services are designed for people with hearing or speech impairment who use telecommunication devices to contact hearing interpreters in Sign Language, Cued Speech and speech language at a distant Centre. To integrate automatisation in this telecommunication chain, applications based on automatic gestural iconic signs recognition will be developed to complement vocal and tactile commands within mobile phones or tablet computers. For this objective, the project will develop models for automatic recognition and generation of signs derived from Cued Speech gestures towards text and/or speech sound. This work will explore new algorithms (based on recent techniques of deep learning) for multimodal communication (including text, speech, lipreading, and manual gestures) between hearing participants and participants with hearing impairment. The extraction of pertinent features from videos of cuers and the usage of streamlined Recurrent Neural Networks have thus far been able to automatically capture the desynchronisation that occur between hand and Lips in Cued Speech. A first application of these methods at CNRS-GIPSA made it possible to reach a score of 70.8 % for Cued Speech recognition of phonemes in the context of continuous speech (Sankar et al., 2022). The applications based on automatic recognition and generation of iconic signs will be developed for telecommunication devices in relation with the IVèS telecommunication platform. It will thus increase the telecommunication accessibility for people with hearing impairment, including children with only developing text-based skills, taking into account their own preferred communication means.

S. Sankar, D. Beautemps and T. Hueber, “Multistream Neural Architectures for Cued Speech Recognition Using a Pre-Trained Visual Feature Extractor and Constrained CTC Decoding,” ICASSP 2022 – 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 8477-8481, doi: 10.1109/ICASSP43922.2022.9746976.

Supervisors: Denis Beautemps and Thomas Hueber

ESR 10: Sanjana Sankar