Device for Spoken Word and Captioning of Illustrations

Journal: International Journal of Computer Science and Mobile Computing - IJCSMC (Vol.13, No. 9)

Publication Date: 2024-09-30

Authors : Boriya Siddhant; Sunil Kumar;

Page : 9-18

Keywords : CNN; Deep Learning; Text to Speech; Microsoft Speech TS; Speech Recognition;

Source : Download Find it from : Google Scholar

Abstract

This project shows the creation of a unique app that can take a picture as input, make a descriptive description for it, and then turn that caption into speech. The system uses cutting-edge models for both picture captioning and text-to-speech synthesis. This makes sure that the process from capturing an image to producing sound is smooth and quick. To do this, the app uses Hugging Face Transformers for model inference, which lets you use models that have already been trained and makes it easier to add advanced deep learning methods. The web interface was made with Streamlit, which makes it easy for people to connect with each other. A number of tools were also used to handle the processing of both audio and video data. This made sure that the application worked well at all times. In terms of accessible technology, this app is a big step forward. It gives people who are blind or have low vision a useful tool that can help them interact with visual material in a more meaningful way. This app is a great way to help bridge the gap between visual and auditory material because it uses cutting-edge technology and is designed to be accessible to everyone.

Main Menu

Searching By

PARTNERS

Device for Spoken Word and Captioning of Illustrations

Abstract

Advertisement