“Like an electric car”…AI voice recognition technology also changed generations

Professor Junhyuk Jang of Hanyang University, who led the development of E2E voice recognition technology in KT AI One Team
Professor Junhyuk Jang of Hanyang University, who led the development of E2E voice recognition technology in KT AI One Team

“If you see classic artificial intelligence voice recognition as an internal combustion engine vehicle, end-to-end (E2E) voice recognition can be seen as an electric car.”

Professor Jang Joon-hyuk of Hanyang University compared the technology of’E2E voice recognition’. E2E voice recognition is one of the four AI technologies that KT selected as a joint research achievement after establishing the’AI One Team’, a national AI council, a year ago.

Just a few years ago, through CES, voice recognition with Amazon Alexa emerged as a major application of AI technology, and emerging tech companies have hurried to launch AI voice recognition speakers. Researchers claiming to be the smartest in many countries and companies around the world have focused on making machines understand humans better.


■ What is E2E voice recognition?

Meanwhile, the biggest topic in the field of AI voice recognition technology is E2E voice recognition.

Previously, speech recognition technology used a variety of individual parts and algorithms to convert human language into text that can be processed by a machine. It finds a phoneme from a human voice, extracts a word, and processes it as a sentence, which undergoes a complex processing procedure for each module divided by function.

On the other hand, in E2E voice recognition, sentence text is processed immediately when voice is input within one module. Compared to the speech recognition technology of the past, it is more similar to the human knowledge processing process. It imitates the intelligence that a person thinks as it is, and when it is input in a pattern, an output comes out. This is because it is not a form of combining multiple computing results.

Of course, E2E technology looks great, but not all companies that have started developing AI speech recognition technology are taking this approach. This is because we have to consider the efficient aspect in the current AI application stage.

Professor Jang Joon-hyuk said, “In the process of technological development, E2E voice recognition is competing with classical voice recognition technology.” “E2E voice recognition is a technology that is one step up, but it can be seen that the superior advantages of previous voice recognition technologies are still ahead. I said.

The story that the newly elaborately developed technology is lagging behind the old one may be puzzled. Professor Jang pointed out that classical speech recognition technology can quickly improve performance.

He said, “Because the classic voice recognition technology is designed for each module, if you try to improve a specific performance, you can remarkably improve the quality in the actual commercial service. If you want to improve it, you have to fix the whole deep learning structure.”

He added, “Even if it is an internal combustion engine that seems difficult to develop any more, is it not performing well right now?” Conversely, with the development of battery technology for electric vehicles, the mileage may increase further, the car platform may become lighter, and the possibility of development is great.”


■ Challenges that are not easy to even try, to the global stage

It is rare in Korea to have an E2E voice recognition technology developed by Professor Jang in the KT AI One Team. Even at the world level, a few companies that are leading AI development are only one step ahead.

This is because it is difficult for new technologies to settle because of the efficient aspects of the old technologies. Professor Jang said, “Because of the fact that an internal combustion engine vehicle is as efficient as it is, there are many companies in Korea that have not yet started developing E2E voice recognition technology.”

It is said that it is not easy to spend manpower and cost to develop new technologies based on past technologies that can be effective immediately. It is similar to the relationship between internal combustion engines and electric vehicles, fossil energy and renewable energy.

Professor Jang said that he was able to try the development through the KT AI One Team, and that the results were satisfactory.

He said, “If Google is far ahead and it is comparable to the AI ​​technology development flow of companies such as Apple, Amazon, Facebook, and Baidu, it is a bit late because we have few papers and the results have only been published,” he said. “If you’re looking at recognition technology as a performance measure, it’s already caught up in certain environmental outcomes.”

The E2E speech recognition technology developed by Professor Jang’s research team has also attracted attention from academia for the fact that it significantly improved the word error rate. The speech recognition rate was raised in a more demanding environment by deliberately destroying the frequency of speech speech, a deep learning material for speech recognition, and learning.

Such achievements are not just the result of the industry-academia-research alliance called AI One Team, but will also rise to the overseas AI technology stage.

Related Articles


KT “Achievement of joint research results of AI One Team… Immediately applied to industry”


KT “Collaboration is essential for AI global competitiveness”


Sue Sang-mo KT “When the AI ​​One Team is strengthened… Focusing on specific achievements”


KT creates a’smart contact center’ combining call center and AI

Professor Jang said, “E2E voice recognition technology, like internal combustion engines and electric cars, has the potential to develop right away like a hybrid car by using it with the past method. As announced by KT, it is first introduced into AI-based contact centers (call centers), If it is made even more lightweight, the E2E voice recognition platform may fit into a small terminal.”

“There are companies that can’t even start development, but they have jumped into challenging tasks as an AI team, and in terms of performance, they have reached the world’s best level in part.” “In terms of development progress, we are planning to lighten the weight and increase the performance. The results of the research so far are being prepared to be presented at a conference that attracts the attention of the global AI industry.”





Source