A Mask-RCNN based object detection and captioning framework for industrial videos
Journal: International Journal of Advanced Technology and Engineering Exploration (IJATEE) (Vol.8, No. 84)Publication Date: 2021-11-30
Authors : Manasi Namjoshi; Khushboo Khurana;
Page : 1466-1478
Keywords : Object detection; Mask-RCNN; Video captioning; Video analysis; Image captioning.;
Abstract
Video analysis of the surveillance videos is a tiresome and burdenous activity for a human. Automating the task of surveillance video analysis, specifically industrial videos could be very useful for productivity analysis, to assess the availability of raw materials and finished goods, fault detection, report generation, etc. To accomplish this task we have proposed a video captioning and reporting method. In video captioning, we generate summaries in understandable language that comprehend the video. These descriptions are generated by understanding the events and objects present in the video. The method presented in this paper constructs a captioned video summary, comprising of frames and their descriptions. Firstly, the frames are extracted from the video by performing uniform sampling. This reduces the task of video captioning to image captioning. Then, Mask- Region-based Convolutional Neural Network (RCNN) is utilized for detecting the objects like raw materials, products, humans, etc. from the sampled video frames. Further, a template-based sentence generation method is applied to obtain the image captions. Finally, a report is generated outlining the products present, and details relating to the production, like duration of the product being present, the number of products detected, the presence of operator at the workstation, etc. This framework can greatly help in bookkeeping, performing day-wise work-analysis, to keep track of employees working in a labor-intensive industry or factory, performing remote monitoring, etc., thereby reducing the human effort of video analysis. On the object classes for the created dataset, we have obtained an average confidence score of 0.8975, and an average accuracy of 95.62%. Moreover, as the captions are template-based the sentences generated are grammatically and meaningfully correct.
Other Latest Articles
- Response of five-phase synchronous reluctance motor with direct torque control technique
- A systematic literature review on student performance predictions
- An application of logistic model tree (LMT) algorithm to ameliorate Prediction accuracy of meteorological data
- Deep learning neural network seismic big-data analysis of earthquake correlations in distinct seismic regions
- Routing in networks of autonomous underwater vehicles
Last modified: 2022-01-10 22:05:30