• Login
    View Item 
    •   etd@IISc
    • Division of Electrical, Electronics, and Computer Science (EECS)
    • Electrical Engineering (EE)
    • View Item
    •   etd@IISc
    • Division of Electrical, Electronics, and Computer Science (EECS)
    • Electrical Engineering (EE)
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Audio-Visual association learning in Humans and Multimodal Networks

    View/Open
    Thesis full text (1.552Mb)
    Author
    Harjpal, Chandrakant
    Metadata
    Show full item record
    Abstract
    We easily learn audiovisual associations when we give visual objects their names. While humans easily learn the names of new objects while retaining previously learned information, deep neural networks forget old associations when trained with new ones, a phenomenon called catastrophic forgetting. In this thesis, I have performed two studies to characterize human and deep network performance on learning novel audiovisual associations. In Study 1, we compared humans and a multimodal deep network performance on learning novel object-word associations and decay in performance of initially encountered pairs after they learn more pairs. We selected 60 object-word pairs from the Novel Object and Unusual Name (NOUN) dataset and performed equivalent experiments on both humans and deep networks. In the human experiments, participants performed 6 sessions of learning and testing novel object-word associations. In each session, they were asked to memorize 10 novel object-word pairs each time and were tested by presenting them with the spoken word (in a different voice/accent) and were asked to identify the associated image (in a different color/orientation) among the all the object images encountered in the session. This test was performed for some objects immediately and for others after a new learning session. Human accuracy was 59% on immediate test and decreased only slightly when tested after larger intervals. Participant accuracy was significantly smaller on delayed test compared to the immediate test. In the deep network experiment, we used audiovisual network with image and audio subnetwork and did triplet loss training to learn object-word associations in the same way as the human experiments. On each session, the network was trained on 10 object-word pairs and tested by presenting an audio word and finding the nearest matching image among the full set. We evaluated two scenarios: a vanilla network with no constraint on weights, a network with elastic weight consolidation (EWC). We found a decrease in performance of initially encountered pairs after network was trained with new pairs in vanilla setting but improved with Elastic weight consolidation method. We matched Current sessions accuracies with human performance to compare on forgetting performance and saw Vanilla network is worse than human performance, but EWC was performing better than humans. In Study 2, we investigated if there is an order preference between Images and audio modality during learning of audiovisual association in Humans, i.e., if the image is presented before audio, is it better than audio first? We distributed pairs with different learning conditions which varied in either order of image and audio presentation or delay between end of one modality and start of presentation of second modality, the time delay values were 0 ms, 500 ms and 1000 ms. The pairs are shown in different learning conditions to subjects and then tested on cross-modal matching task with both image and audio as question modality in two separate tests. If there is indeed an optimal learning condition it should reflect in better test performance in that learning condition. We found that there was not a significant difference between performance due to order or delay of encountering image or audio in both the tests.
    URI
    https://etd.iisc.ac.in/handle/2005/6527
    Collections
    • Electrical Engineering (EE) [357]

    etd@IISc is a joint service of SERC & J R D Tata Memorial (JRDTML) Library || Powered by DSpace software || DuraSpace
    Contact Us | Send Feedback | Thesis Templates
    Theme by 
    Atmire NV
     

     

    Browse

    All of etd@IIScCommunities & CollectionsTitlesAuthorsAdvisorsSubjectsBy Thesis Submission DateThis CollectionTitlesAuthorsAdvisorsSubjectsBy Thesis Submission Date

    My Account

    LoginRegister

    etd@IISc is a joint service of SERC & J R D Tata Memorial (JRDTML) Library || Powered by DSpace software || DuraSpace
    Contact Us | Send Feedback | Thesis Templates
    Theme by 
    Atmire NV