Image Captioning Model using Visual Aligning Attention and Deep Matrix Factorization
Pages : 948-951
Download PDF
Abstract
Image captioning technique is a complicated task that bridges both the visual and linguistic domains. Image captioning models are required to understand the content of input images to generate sentences with human languages. The attention technique, widely used for Image Captioning task provides more accurate information. Attention technique explicitly trains the deep sequential models. In this work, we have proposed a system using visual aligning attention model and deep matrix factorization; Visual aligning attention model focuses on the region of interest using CNN and LSTM as encoder- decoder. While DMF works on refinement and assignment of image tag. The dataset used is FLICKR8k for caption generation. The experimental results show that the proposed system gives more accurate results. Captions generated are more descriptive and accurate.
Keywords: Encoder-decoder; Visual Aligning; Global Aligning; CNN; RNN; Semantic; Remote Sensing; LSTM; Language Model.