Automatic Image Caption Generation: Study And Implementation

2021

Mémoire de Master

ASJP

Sciences Et Technologie

Université De Ghardaia

K

Korichi, Safa Batoul

A

Aimene, Karim

Résumé: Artificial Intelligence (AI) is currently moving increasingly towards multimodal learning which involve build system that can process information from multiple sources, such as text, images or audio. Image captioning is one of the main visual-linguistic tasks that requires generating captions to a specific image. The challenge is to create a unified Deep Learning (DL) model, suitable to describe an image in a correct sentence. To do so, we need to understand the proper way to visualize the text in a certain space. We used the new term of Transformer that brings a new concept into a sequence to sequence mechanism, we also include the power of modern GPU in processing data in an efficient and faster manner. In this path, we have experimented with a Transformer-based approach and applied it to the image captioning problem using MS COCO dataset.

Mots-clès:

multimodal learning

image captioning

deep learning (dl)

transformer

sequence to sequence

ms-coco

Publié dans la revue:

Nos services universitaires et académiques

Automatic Image Caption Generation: Study And Implementation

Nos services universitaires et académiques

Aucun fichier associé

Si le fichier est volumineux, l'affichage peut échouer. Vous pouvez obtenir le fichier directement en cliquant sur le bouton "Télécharger".

Documents et articles similaires: