EchoSpeech, the glasses that 'read' the lips

Researcher Ruidong Zhang built EchoSpeech, AI-enhanced glasses that can translate lip-to-text on smartphones

<br />

It might appear that Ruidong Zhang is talking to himself, but in fact the information science doctoral student is ‘talking’ with his glasses, to transcribe a text on his smartphone.

Ruidong Zhang, researcher and inventor of EchoSpeech

The boy invented the so-called EchoSpeech which, as the name indicates, can listen to the speech of the wearer or of a person in front of him, albeit in a low voice. Their use? Create a new interface between man and technology, also to eliminate barriers and promote accessibility. Developed by Cornell University’s Smart Computer Interfaces for Future Interactions (SciFi) Lab, the low-power wearable interface requires only minutes of user training data before it recognizes commands and can run on a smartphone. Zhang is the lead author of the study EchoSpeech: Continuous Silent Speech Recognition on Minimally-obtrusive Eyewear Powered by Acoustic Sensing, which will be presented later this month at the Association for Computing Machinery Conference on Human Factors in Computing Systems (CHI) in Hamburg, Germany .

“For people who can’t vocalize sound, this silent speech technology could be an excellent input to a speech synthesizer. It could give patients their voice back,” Zhang said.

In its current form, EchoSpeech could be used to communicate with others via smartphone in places where talking is uncomfortable or inappropriate, such as a noisy restaurant or library. The silent voice interface can also be paired with a stylus and used with design software such as CAD, eliminating the need for a keyboard and mouse altogether.

How EchoSpeech works

Equipped with a pair of microphones and speakers smaller than pencil erasers, the EchoSpeech glasses become an AI-powered wearable sonar system that sends and receives sound waves across the face and detects mouth movements. A deep learning algorithm, also developed by SciFi Lab researchers, then analyzes these echo profiles in real time, with approximately 95% accuracy.

“We are moving sonar to the body,” said Cheng Zhang, assistant professor of information science at the Cornell Ann S. Bowers College of Computing and Information Science and director of the SciFi Lab. “We are very excited about this system,” he said. stated, “because it really pushes the envelope in terms of performance and privacy. It’s small, low-power, and privacy-conscious, all of which are important for implementing new wearable technologies in the real world.”

The SciFi Lab has developed several wearable devices that track body, hand, and facial movements using machine learning and miniature wearable cameras. Recently, the lab has moved away from cameras and turned to acoustic sensing to track face and body movements, citing longer battery life; increased security and privacy; and smaller, more compact hardware. EchoSpeech is based on the acoustic sensing device the lab has already developed and called EarIO, a wearable headset that tracks facial movements.

To date, most speech recognition technology is limited to a select set of predetermined commands and requires the user to face or wear a camera, which is neither practical nor feasible. There are also big privacy issues surrounding wearable cameras, both for the user and for those with whom the user interacts.

Acoustic sensing technology such as EchoSpeech eliminates the need for such devices, as audio data is much smaller than image or video data, requires less processing bandwidth, and can be forwarded to a smartphone via Bluetooth in real time .

“And because the data is processed locally on smartphones instead of being uploaded to the cloud, sensitive information never leaves your control.”

Battery life also improves exponentially, ten hours with acoustic detection versus 30 minutes with a camera. The team is exploring commercialization of the technology behind EchoSpeech, thanks in part to Ignite: Cornell Research Lab to Market gap funding.

In the next work, the SciFi Lab researchers will explore smart-glass applications for tracking the movements of the face, eyes and upper body.

“We think glasses will be an important personal computing platform for understanding human activities in everyday environments,” said Cheng Zhang.

The other co-authors are information science PhD student Ke Li, Yihong Hao, Yufan Wang, and Zhengnan Lai.

EchoSpeech, the glasses that ‘read’ the lips

How EchoSpeech works

Share this:

Related

Al-Jaish qualifies for the second round of the Salman Cup

BG BAU: Protect from the sun as early as spring

You may also like

Leave a Comment Cancel Reply