At exactly 10 hours 10 minutes 10 seconds on January 1, 2018, in Hanoi, the project “Vietnamese digitalization system” was officially launched under the presidency of Deputy Prime Minister Vu Duc Dam with the aim of “Sharing knowledge – Promoting creativity – Connecting the community – For the future of Vietnam”.
Within the framework of the Project “Development of Vietnamese digital knowledge system”, the Vietnamese speech data platform project has initially achieved certain results.
The project is currently attended by the Institute of Information Technology – Vietnam Academy of Science and Technology, Hanoi University of Science and Technology, Vietnam Television (VTV), Voice of Vietnam Radio (VOV ), Vbee Joint Stock Company (vbee.vn), VAIS Joint Stock Company (vais.vn).
Most of the voice data of major firms is not shared with the community. This does not create favorable conditions for research and creativity of scientists as well as the application of technologies or research results into practice.
The goal of the project is to create a platform to collect, build and share Vietnamese language voice data to help develop speech technology research and solutions such as converting text into speech and vice versa. Again, it applies to many different industries and fields in society. This is a necessary and favorable condition for scientists to conduct in-depth studies based on a large enough voice data, creating certain advantages to be able to compete in the area. region and the world. Along with this, businesses can exploit and apply research results to solve real-world problems in Vietnam, such as in automating dialogue in smart switchboards, interacting in languages. speaking in virtual assistants or IoT devices, virtual tour guides, virtual MCs, digitizing books – newspapers – electronic lectures … The platform for collecting, building and sharing voice data is developed based on on an open project of Mozilla: Common Voice. Anyone can join by recording their voice on the web or phone to contribute to the shared data warehouse. A person can also listen and test to confirm the quality of others to create reliable data. Broadcast stations or radio stations such as VTV and VOV can share audio and video about news, movies, television programs … to enrich the community data warehouse. Businesses can share data and innovative applications for the community to use shared data on the platform.