How-to China: AI anchor to help the deaf community enjoy the Games more
Q: The country's first AI sign language anchor will serve the targeted audience during the Winter Olympics and the Winter Paralympics. The world's biggest corpus behind the AI TV anchor was created by your team. Could you explain to us the leading technologies of the database? During your research in the past six years, were there any difficulties you faced? Were there any shining moments you could share with us?
Yuan: Let me first explain two basic technologies – the "sign language recognition" and "sign language generation" – the key points I use to explain to outsiders the technological arena.
The "sign language generation" refers to the technology that helps generate the sign language for the audience.
However, our technology is in the area of the sign language recognition – recognizing the sign language with the particular word order from the people with the hearing difficulties and then transforming them into the word order.
Here I have to stress their word order is different from us. For example, they place the predicate at the end of the sentence, while we put them in the middle of the sentence, just between the subject and the object in modern Chinese.
For example, when those people write down some messages, their word order sometimes mislead us.
So the AI hostess needs to express with the audience group's word orders.
The sign language recognition and generation technologies are complementary with each other.
We are dealing with the sign language recognition part: we process the information from the word orders of people with hearing difficulties, transfer them into ours.
And then the sign language generation technology transfers our word order into the word order of the people with hearing difficulties; and it drives the AI anchor to show the right sign language.
The international research circle concluded that the sign language recognition is much more difficult than the generation in various technologies involved.
Our process includes transforming the video of the sign language movements into the text in their word order.
We have at least three channels, namely expression, limb and hand gesture.
We also conducted research on action recognition and the analysis of their intention. The limb has 18 points to analyze, a hand -- 21 points and a face -- more than 100 points to analyze. All those are challenging AI and algorithm.
Many domestic and international companies hope to join the forces, but when they find sign language recognition is much more difficult than sign language generation, they retreat.
The voice recognition is easier.
In terms of the shining moments during the studies, I think, I really hope to express that it was the support and contributions from a great number of the people with hearing difficulties that impressed all of us.
They provided their sign language to us to accumulate the database and the database, turning it into the world's largest and helping make it ever growing.
As the senior prefer to use sign language more than the younger generations, who prefer to use voice recognition, they have made great contributions for us to form the database.
When we told them that we needed to form the database, many elderly, despite of their age and health conditions, were very enthusiastic to help us to show their sign language…we have been extremely moved.
It was the kindness and warmth from the group of the enthusiastic and kind people, who volunteered to provide their sign language in different fields, including some very nice elderly people, that enhanced our confidence and gave us strength to go ahead with the challenging studies.