Competition Entries

There are eight categories under the HKICTA 2024. There will be one Grand Award in each category, and an “Award of the Year” will be selected from the eight Grand Awards by the Grand Judging Panel. In a bid to foster the innovative use of artificial intelligence (AI), each of the eight categories will establish a new distinguished accolade: the “Best Use of AI” award, magnifying and honouring outstanding achievements in harnessing the power of AI in respective areas.

EdCity is officially appointed by DPO to be the Leading Organiser of the Hong Kong ICT Awards 2024: Student Innovation Award. The Student Innovation Award covers 4 streams, including Primary, Junior Secondary, Senior Secondary and Higher Education. By drawing on innovative strategies and best practices, EdCity hopes to drive innovation within the awards, fostering an environment that encourages students to push boundaries and think outside the box, and ultimately advance the ICT Industry.

Project 1: AudioSense

Student Innovation (Higher Education) Gold Award

Hong Kong Institute of Information Technology (TSANG Shun Tin / HO Lok Yin / HO Cheuk Hin / CHAN Ka Wing)

AudioSense is an AI-driven audio description platform designed to streamline the creation of audio descriptions for visually impaired individuals. The system leverages cutting-edge AI technologies, including AI Video Accessibility Assessment, AI Movie Scene Analysis, and AI Script Composition Assistance, to utilize machine learning models trained on diverse datasets, such as online videos, news videos, and audio-described videos, with the intention to provide high-quality descriptive audio. Additionally, AudioSense integrates cloud infrastructure for scalable and efficient processing, and provides user-friendly mobile and web application interfaces that support seamless video uploading and script editing. This project also incorporates accessibility features, including voice control and compatibility with Siri on iOS devices, enhancing inclusivity and ease of use.

Main functions:
1. AI Video Accessibility Evaluation: Evaluates the accessibility of video content and checks for sensitive information such as violent or explicit content.
2. AI Movie Scene Analysis: Automatically breaks down and organises scene information (spatial layout, character expressions, actions, etc.) to ensure fast and accurate results.
3. AI Script Writing Assistance: Helps narrators generate accurate and effective scripts for audio description, speeding up the scripting process to increase efficiency.
4. AI Voice Video Generation: Combines voice and graphic technologies which supports multiple languages and voice styles to ensure clear and natural narration. In addition, the platform includes a voice overlay check to ensure that background sounds and voice descriptions do not interfere with each other.

There are 285 million blind or visually impaired persons in the world, with 200,000 visually impaired persons in Hong Kong at present, creating a huge demand for audio description services. AudioSense enables users to upload, play, edit videos and scripts easily to enhance the viewing experience of the visually impaired, allowing visually impaired persons to access and appreciate visual content, promoting social inclusion and equal access to information, so that everyone, regardless of their visual ability, can enjoy and participate in the rich world of visual media.

Project 2: AI-Driven Real Time Sign Language Translate App – HandsTalk

Student Innovation (Higher Education) Silver Award

Hong Kong University of Science and Technology (LEE Cheuk Sum / SO Ho Mang Marcus / WONG Ho Leong)

The World Health Organisation (WHO) recognises that more than 5% of the world’s population needs rehabilitation for hearing impairment. In Hong Kong, there is only one sign language interpreter for every 3,000 deaf-mute people, and the demand for real-time sign language interpreting services is very high.

HandsTalk is an AI-powered real-time sign language translation mobile communication app designed to remove communication barriers for sign language users, allowing them to communicate directly without intermediaries and without the need for any special device. The app seamlessly translates sign language into English using advanced AI models, computer vision technology and generative AI. Users can use it in real-time scenarios, video conversations, and more.

For sign language translation, HandsTalk has designed a new sentence completion feature that involves precise word selection to help compile lists of words and phrases. The application also recognises different gestures such as ‘question mark’, ‘space’ and ‘delete’ to enhance the flexibility of the generated sentences. The generative AI then creates coherent sentences from the translated words and phrases. This process usually takes less than 2 seconds, which is equivalent to sending a text message or speaking. This approach effectively addresses the challenges of real-time sign language translation, such as sentence translation errors and unpredictable user actions, and accurately conveys the meaning the user intends, allowing the user to communicate confidently, thereby bridging the gap between signed and spoken language.

HandsTalk’s sign language translation can be used face-to-face in real time or during a video call.
1. Live Translation: When using Live Translation, turn the camera towards the sign language user to translate.
2. Video Call Translation: During a video call, sign language and speech will be converted into text. This allows sign language users and non-sign language users to communicate seamlessly on the same channel without any barriers.

In the future, this method can be extended to any type and variation of sign languages as long as a high-quality dataset is available. In addition, users can also access sign language videos and images in the application for demonstration and learning purposes, and evaluate their progress using the sign language translation feature.