Justyna Golec, Institute of Computer Science, Pedagogical University of Krakow, Poland on Multimodal matching

17 October 2021

Date: Wednesday, October 27, 2021, 14:14 CET

Speaker: Justyna Golec, Institute of Computer Science, Pedagogical University of Krakow, Poland

Title: Multimodal matching – methods to determine similarity between text and image
Title: Multimodal matching – metody pozwalające na określenie podobieństwa pomiędzy tekstem a obrazem

Abstract: Deep learning methods have enabled the creation of new classes of neural networks that can be used to determine similarity between data of different modalities. These methods belong to the group of multimodal matching algorithms. Interesting examples from this branch of methods are networks allowing determining the similarity between a digital image (e.g. photo) and a text written in natural language. They can be used in practice, for example, to generate a summary of text in the form of illustrative images.
In this Seminar I will discuss a network architecture that can be used for image-text matching purpose based on Vision Transformer, ResNet and BERT. Preliminary evaluation results of the proposed method will also be presented.

Abstract: Metody głębokiego uczenia pozwoliły na stworzenie nowych klas sieci neuronowych, dzięki którym możliwe jest określanie podobieństwa pomiędzy danymi o różnych modalnościach. Metody te należą do grupy algorytmów multimodal matching. Ciekawym przykładem z tej gałęzi metod są sieci pozwalające na określenie podobieństwa pomiędzy obrazem cyfrowym (np. zdjęciem) i tekstem pisanym językiem naturalnym. Można je w praktyce zastosować np. w celu generowania podsumowania tekstu w formie obrazków ilustracyjnych.
Na seminarium omówiona zostanie przykładowa architektura rozwiązania, która może zostać wykorzystana do tego celu oparta na Vision Transformer, ResNet oraz BERT. Przedstawione zostaną też wstępne wyniki ewaluacji proponowanej metody.

Language: Polish

Online access: Zoom platform