Authors: Muhamamd Irzam Liaqat, Qaiser Abbas, Shah Nawaz, Muhammad Zaigham Zaheer, Marta Moscati, Yufang Hou, Muhammad Haris Khan, Salman Khan, Elisabeth Andre, Markus Schedl

Fig 1: Abstract Overview of different type of learning.(a). Unimodal Learning (b). Multimodal Learning (c). Multimodal Learning under Missing Modalities (d). Multimodal Learning under CorruptedΒ Modalities

Fig 2: Use-cases of data corruptions in real world. (a) Multimodal Learning (b) Multimodal Learning with missing modalities (c) Multimodal Learning with Corrupted Modalities
We strongly encourage the contributors/researchers to contribute to the research community in this specific research area. To add latest papers just make pull request to update the new paper's information!
- Existing Survey Paper
- Multimodal Learning with Missing Modalities
- Multimodal Learning with Corrupted Modalities
- License
- Citation
Existing Survey Paper Back to Top
| Year | Title | Paper Link | Code Link |
|---|---|---|---|
| 2024 | Deep multimodal learning with missing modality: A survey | π Link | - |
| 2024 | Multimodal fusion on low-quality data: A comprehensive survey | π Link | - |
| 2023 | Multimodal learning with transformers: A survey | π Link | - |
| 2022 | A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets | π Link | - |
| 2020 | A survey on deep learning for multimodal data fusion | π Link | - |
| 2019 | Deep multimodal representation learning: A survey | π Link | - |
| 2018 | Multimodal machine learning: A survey and taxonomy | π Link | - |
Multimodal Learning with Missing Modalities Back to Top

Fig 3: Overview of our missing modality taxonomy with SOTA methods Modalities

Fig 4: Overview of the existing studies on multimodal learning under missing modalities, showing (a) yearly publication trends, (b) application areas, (c) modality distribution, and (d) publication venues
1.1.1 Generative Back To Top

Fig 5: High level overview of generative methods for missing modality handling
| Year | Title | Paper Link | Code Link |
|---|---|---|---|
| 2025 | Knowledge Bridger: Towards Training-Free Missing Multi-Modality Completion | π | π» |
| 2025 | Multimodal Cascaded Framework With Multimodal Latent Loss Functions Robust To Missing Modalities | π | - |
| 2025 | Sdr-Gnn: Spectral Domain Reconstruction Graph Neural Network For Incomplete Multimodal Learning In Conversational Emotion Recognition | π | - |
| 2025 | Amm-Diff: Adaptive Multi-Modality Diffusion Network For Missing Modality Imputation | π | - |
| 2024 | Fmcnet |
π | - |
| 2024 | Unified Multi-Modal Image Synthesis For Missing Modality Imputation | π | - |
| 2024 | Deformation-Aware And Reconstruction-Driven Multimodal Representation Learning For Brain Tumor Segmentation With Missing Modalities | π | π» |
| 2024 | Do We Really Need To Drop Items With Missing Modalities In Multimodal Recommendation? | π | - |
| 2023 | Unimf: A Unified Multimodal Framework For Multimodal Sentiment Analysis In Missing Modalities And Unaligned Multimodal Sequences | π | π» |
| 2023 | Learning Unified Hyper-Network For Multi-Modal Mr Image Synthesis And Tumor Segmentation With Missing Modalities | π | π» |
| 2023 | Exploiting Modality-Invariant Feature For Robust Multimodal Emotion Recognition With Missing Modalities | π | - |
| 2022 | M2R2: Missing-Modality Robust Emotion Recognition Framework With Iterative Data Augmentation | π | - |
| 2022 | Region-Of-Interest Attentive Heteromodal Variational Encoder-Decoder For Segmentation With Missing Modalities | π | π» |
| 2022 | Fmcnet: Feature-Level Modality Compensation For Visible-Infrared Person Re-Identification | π | - |
| 2021 | Semi-Supervised Multimodal Image Translation For Missing Modality Imputation | π | - |
| 2021 | Brain Tumor Segmentation For Missing Modalities By Supplementing Missing Features | π | - |
| 2021 | Feature-Enhanced Generation And Multi-Modality Fusion Based Deep Neural Network For Brain Tumor Segmentation With Missing Mr Modalities | π | - |
| 2021 | Glioblastoma Multiforme Prognosis: Mri Missing Modality Generation, Segmentation And Radiogenomic Survival Prediction | π | - |
| 2021 | Missing Modality Imagination Network For Emotion Recognition With Uncertain Missing Modalities | π | - |
| 2020 | Optimal Sparse Linear Prediction For Block-Missing Multi-Modality Data Without Imputation | π | - |
| 2020 | Estimation Of Missing Values In Heterogeneous Traffic Data: Application Of Multimodal Deep Learning Model | π | - |
| 2018 | Synthesizing And Reconstructing Missing Sensory Modalities In Behavioral Context Recognition | π | - |
| 2018 | Deep Adversarial Learning For Multi-Modality Missing Data Completion | π | - |
1.1.2 Alignment Back to Top

Fig 6: High level overview of alignment methods for missing modality handling
| Year | Title | Paper Link | Code Link |
|---|---|---|---|
| 2025 | Robust Multimodal Learning Via Cross-Modal Proxy Tokens | π | - |
| 2025 | Wasserstein Modality Alignment Makes Your Multimodal Transformer More Robust | π | - |
| 2024 | Multimodal Knowledge Graph Embedding With Missing Data Integration | π | - |
| 2024 | Penta-Encoder With Medical Transformer For Incomplete Multimodal Learning Of Brain Tumor Segmentation | π | - |
| 2023 | Rethinking Missing Modality Learning From A Decoding Perspective | π | - |
| 2023 | Exploiting Multi-Modal Fusion For Robust Face Representation Learning With Missing Modality | π | - |
| 2023 | Multimodal Language Learning For Object Retrieval In Low Data Regimes In The Face Of Missing Modalities | π | - |
| 2023 | Cross-Modal Alignment And Translation For Missing Modality Action Recognition | π | - |
| 2022 | Mm-Align: Learning Optimal Transport-Based Alignment Dynamics For Fast And Accurate Inference On Missing Modality Sequences | π | π» |
| 2022 | M3Care: Learning With Missing Modalities In Multimodal Healthcare Data | π | - |
| 2022 | A General Framework For Incomplete Cross-Modal Retrieval With Missing Labels And Missing Modalities | π | - |
| 2021 | A Non-Linear Mapping Representing Human Action Recognition Under Missing Modality Problem In Video Data | π | - |
| 2018 | Generalized Bayesian Canonical Correlation Analysis With Missing Modalities | π | - |
1.2.1 Model Design Back to Top

Fig 7: High level overview of model based methods for missing modality handling
| Year | Title | Paper Link | Code Link |
|---|---|---|---|
| 2025 | Uml: A Unified Multimodal Learning Framework For Cataract Postoperative Visual Acuity Prediction With Uncertain Missing Modalities | π | π» |
| 2024 | Missing Modality Robustness In Semi-Supervised Multi-Modal Semantic Segmentation | π | π» |
| 2024 | Mmmvit: Multiscale Multimodal Vision Transformer For Brain Tumor Segmentation With Missing Modalities | π | π» |
| 2024 | Robust Multimodal Learning With Missing Modalities Via Parameter-Efficient Adaptation | π | - |
| 2024 | Unibev: Multi-Modal 3D Object Detection With Uniform Bev Encoders For Robustness Against Missing Sensor Modalities | π | - |
| 2023 | Towards Good Practices For Missing Modality Robust Action Recognition | π | - |
| 2023 | M3Ae: Multimodal Representation Learning For Brain Tumor Segmentation With Missing Modalities | π | - |
| 2023 | Multi-Modal Learning With Missing Modality Via Shared-Specific Feature Modelling | π | - |
| 2022 | Smu-Net: Style Matching U-Net For Brain Tumor Segmentation With Missing Modalities | π | π» |
| 2022 | Moddrop++: A Dynamic Filter Network With Intra-Subject Co-Training For Multiple Sclerosis Lesion Segmentation With Missing Modalities | π | - |
| 2022 | Mmformer: Multimodal Medical Transformer For Incomplete Multimodal Learning Of Brain Tumor Segmentation | π | π» |
| 2021 | Maximum Likelihood Estimation For Multimodal Learning With Missing Modality | π | - |
| 2020 | Training Strategies To Handle Missing Modalities For Audio-Visual Expression Recognition | π | - |
| 2020 | Multimodal Biometrics Recognition From Facial Video With Missing Modalities Using Deep Learning | π | - |
| 2019 | A Unified Representation Network For Segmentation With Missing Modalities | π | - |
| 2019 | Audio Feature Generation For Missing Modality Problem In Video Action Recognition | π | - |
| 2019 | Brain Tumor Segmentation On Mri With Missing Modalities | π | - |
1.2.2 Selective Fusion Back to Top

Fig 8: High level overview of fusion based methods for missing modality handling
| Year | Title | Paper Link | Code Link |
|---|---|---|---|
| 2023 | What Makes For Robust Multi-Modal Models In The Face Of Missing Modalities? | π | - |
| 2023 | Rethinking Uncertainly Missing And Ambiguous Visual Modality In Multi-Modal Entity Alignment | π | π» |
| 2022 | Mitigating Inconsistencies In Multimodal Sentiment Analysis Under Uncertain Missing Modalities | π | π» |
| 2021 | Robust Multi-Modality Person Re-Identification | π | - |
1.2.3 Co-Learning Back to Top

Fig 9: High level overview of co-learning methods for missing modality handling
| Year | Title | Paper Link | Code Link |
|---|---|---|---|
| 2023 | Multimodal Federated Learning With Missing Modality Via Prototype Mask And Contrast | π | π» |
| 2023 | Enhancing Modality-Agnostic Representations Via Meta-Learning For Brain Tumor Segmentation | π | - |
| 2023 | Missmodal: Increasing Robustness To Missing Modality In Multimodal Sentiment Analysis | π | π» |
| 2023 | Multimodal Reconstruct And Align Net For Missing Modality Problem In Sentiment Analysis | π | - |
| 2022 | Missing Modality Meets Meta Sampling (M3S): An Efficient Universal Approach For Multimodal Sentiment Analysis With Missing Modality | π | - |
| 2022 | D 2-Net: Dual Disentanglement Network For Brain Tumor Segmentation With Missing Modalities | π | π» |
| 2021 | An Efficient Approach For Audio-Visual Emotion Recognition With Missing Labels And Missing Modalities | π | - |
| 2021 | Smil: Multimodal Learning With Severely Missing Modality | π | - |
| 2021 | Deep Multisensor Learning For Missing-Modality All-Weather Mapping | π | - |
| 2021 | Progressive Modality Cooperation For Multi-Modality Domain Adaptation | π | - |
| 2021 | Acn: Adversarial Co-Training Network For Brain Tumor Segmentation With Missing Modalities | π | - |
| 2018 | Lrmm: Learning To Recommend With Missing Modalities | π | - |
1.2.4 Distillation Back to Top

Fig 10: High level overview of distillation methods for missing modality handling
| Year | Title | Paper Link | Code Link |
|---|---|---|---|
| 2025 | Modalitymirror: Enhancing Audio Classification In Modality Heterogeneity Federated Learning Via Multimodal Distillation | π | - |
| 2025 | Modality-Invariant Bidirectional Temporal Representation Distillation Network For Missing Multimodal Sentiment Analysis | π | - |
| 2025 | Test-Time Adaptation For Combating Missing Modalities In Egocentric Videos | π | - |
| 2024 | Segment Beyond View: Handling Partially Missing Modality For Audio-Visual Semantic Segmentation | π | - |
| 2023 | Prototype Knowledge Distillation For Medical Segmentation With Missing Modality | π | π» |
| 2023 | Msh-Net: Modality-Shared Hallucination With Joint Adaptation Distillation For Remote Sensing Image Classification Using Missing Modalities | π | π» |
| 2023 | Learnable Cross-Modal Knowledge Distillation For Multi-Modal Learning With Missing Modality | π | - |
| 2023 | Multi-Head Siamese Prototype Learning Against Both Data And Label Corruption | π | - |
| 2021 | Dealing With Missing Modalities In The Visual Question Answer-Difference Prediction Task Through Knowledge Distillation | π | - |
| 2020 | Multimodal Learning With Incomplete Modalities By Knowledge Distillation | π | - |
| 2019 | An Adversarial Approach To Discriminative Modality Distillation For Remote Sensing Image Classification | π | - |
| 2019 | Cross-Modal Learning By Hallucinating Missing Modalities In Rgb-D Vision | π | π» |
| 2018 | Modality Distillation With Multiple Stream Networks For Action Recognition | π | - |
1.2.5 Attention Mechanism Back to Top

Fig 11: High level overview of attention methods for missing modality handling
| Year | Title | Paper Link | Code Link |
|---|---|---|---|
| 2024 | Mman-M2: Multiple Multi-Head Attentions Network Based On Encoder With Missing Modalities | π | - |
| 2024 | Framm: Fair Ranking With Missing Modalities For Clinical Trial Site Selection | π | - |
| 2023 | Accommodating Missing Modalities In Time-Continuous Multimodal Emotion Recognition | π | - |
| 2023 | Attention-Based Multimodal Fusion With Contrast For Robust Clinical Prediction In The Face Of Missing Modalities | π | - |
| 2023 | Magnet: Modality-Agnostic Network For Brain Tumor Segmentation And Characterization With Missing Modalities | π | - |
| 2023 | Contrastive Learning-Based Spectral Knowledge Distillation For Multi-Modality And Missing Modality Scenarios In Semantic Segmentation | π | - |
| 2023 | Audio-Visual Sensor Fusion Framework Using Person Attributes Robust To Missing Visual Modality For Person Recognition | π | - |
| 2022 | Tag-Assisted Multimodal Sentiment Analysis Under Uncertain Missing Modalities | π | π» |
| 2022 | A Multimodal Sensor Fusion Framework Robust To Missing Modalities For Person Recognition | π | - |
| 2022 | Multi-Modal Brain Tumor Segmentation Via Missing Modality Synthesis And Modality-Level Attention Fusion | π | - |
| 2022 | Robust Multimodal Sentiment Analysis Via Tag Encoding Of Uncertain Missing Modalities | π | - |
| 2022 | Multimodal Image Aesthetic Prediction With Missing Modality | π | - |
| 2022 | Modality-Adaptive Feature Interaction For Brain Tumor Segmentation With Missing Modalities | π | - |
| 2021 | Multimodal Gait Recognition Under Missing Modalities | π | π» |
| 2020 | Multi-Modality Matters: A Performance Leap On Voxceleb. | π | π» |
| 2020 | Brain Tumor Segmentation With Missing Modalities Via Latent Multi-Source Correlation Representation | π | - |
1.2.6 Prompt Learning Back to Top

Fig 12: High level overview of prompt learning methods for missing modality handling
| Year | Title | Paper Link | Code Link |
|---|---|---|---|
| 2025 | Retrieval-Augmented Dynamic Prompt Tuning For Incomplete Multimodal Learning | π | - |
| 2025 | Efficient Prompting For Continual Adaptation To Missing Modalities | π | - |
| 2025 | Multimodal Invariant Feature Prompt Network For Brain Tumor Segmentation With Missing Modalities | π | π» |
| 2025 | Pal: Prompting Analytic Learning With Missing Modality For Multi-Modal Class-Incremental Learning | π | - |
| 2025 | Semantically Conditioned Prompts For Visual Recognition Under Missing Modality Scenarios | π | π» |
| 2024 | Towards Robust Multimodal Prompting With Missing Modalities | π | - |
| 2023 | Multimodal Prompting With Missing Modalities For Visual Recognition | π | π» |
1.3 Hybrid Appraoches Back to Top

Fig 13: High level overview of Hybrid methods for missing modality handling
| Year | Title | Paper Link | Code Link |
|---|---|---|---|
| 2025 | Cross-Modal Prototype Based Multimodal Federated Learning Under Severely Missing Modality | π | - |
| 2025 | Graph Attention Contrastive Learning With Missing Modality For Multimodal Recommendation | π | - |
| 2025 | Fedmobile: Enabling Knowledge Contribution-Aware Multi-Modal Federated Learning With Incomplete Modalities | π | - |
| 2025 | Incomplete Modality Disentangled Representation For Ophthalmic Disease Grading And Diagnosis | π | π» |
| 2025 | Ssfd-Net: Shared-Specific Feature Disentanglement Network For Multimodal Biometric Recognition With Missing Modality | π | - |
| 2025 | Diffusion-Driven Incomplete Multimodal Learning For Air Quality Prediction | π | - |
| 2025 | Ogp-Net: Optical Guidance Meets Pixel-Level Contrastive Distillation For Robust Multi-Modal And Missing Modality Segmentation | π | - |
| 2025 | Multimodal Sentiment Analysis Based On Multi-Stage Graph Fusion Networks Under Random Missing Modality Conditions | π | - |
| 2025 | Text-Guided Reconstruction Network For Sentiment Analysis With Uncertain Missing Modalities | π | - |
| 2025 | Open-Modality Latent Modality Interaction Maximization For Audio-Visual Learning | π | - |
| 2025 | Disentangling And Generating Modalities For Recommendation In Missing Modality Scenarios | π | - |
| 2025 | Optimus: Predicting Multivariate Outcomes In Alzheimer'S Disease Using Multi-Modal Data Amidst Missing Values | π | - |
| 2025 | Tackling Real-World Complexity: Hierarchical Modeling And Dynamic Prompting For Multimodal Long Document Classification | π | - |
| 2025 | Mi-Cga: Cross-Modal Graph Attention Network For Robust Emotion Recognition In The Presence Of Incomplete Modalities | π | π» |
| 2025 | Emotional Boundaries And Intensity Aware Model For Incomplete Multimodal Sentiment Analysis | π | - |
| 2025 | Adaptive Cross-Modal Representation Learning For Heterogeneous Data Types In Alzheimer Disease Progression Prediction With Missing Time Point And Modalities | π | - |
| 2024 | Modality Translation-Based Multimodal Sentiment Analysis Under Uncertain Missing Modalities | π | - |
| 2024 | Tip: Tabular-Image Pre-Training For Multimodal Classification With Incomplete Data | π | π» |
| 2023 | Feature Fusion And Latent Feature Learning Guided Brain Tumor Segmentation And Missing Modality Recovery Network | π | - |
| 2022 | Are Multimodal Transformers Robust To Missing Modality? | π | - |
| 2021 | Ugaitnet: Multimodal Gait Recognition With Missing Input Modalities | π | π» |
| 2018 | Semi-Supervised Deep Generative Modelling Of Incomplete Multi-Modality Emotional Data | π | - |
| 2018 | Urban Land Cover Classification With Missing Data Modalities Using Deep Convolutional Neural Networks | π | - |

Fig 14: Overview of our corrupted modality taxonomy with SOTA methods Modalities

Fig 15: Overview of the existing studies on multimodal learning under corrupted modalities, showing (a) yearly publication trends, (b) application areas, (c) modality distribution, and (d) publication venues
2.1.1 Denoising Methods Back to Top

Fig 16: High level overview of Hybrid methods for corrupted modality handling
| Year | Title | Paper Link | Code Link |
|---|---|---|---|
| 2024 | Centaur: Robust Multimodal Fusion For Human Activity Recognition | π | - |
| 2023 | Rhvit: A Robust Hierarchical Transformer For 3D Multimodal Brain Tumor Segmentation Using Biased Masked Image Modeling Pre-Training | π | - |
| 2022 | Multimodal Cloud Resources Utilization Forecasting Using A Bidirectional Gated Recurrent Unit Predictor Based On A Power Efficient Stacked Denoising Autoencoders | π | - |
| 2018 | Highly Accurate Image Reconstruction For Multimodal Noise Suppression Using Semisupervised Learning On Big Data | π | π» |

Fig 17: High level overview of architectural methods for corrupted modality handling
2.2.1 Noise Aware Networks Back to Top
| Year | Title | Paper Link | Code Link |
|---|---|---|---|
| 2025 | V 2-Sfmlearner: Learning Monocular Depth And Ego-Motion For Multimodal Wireless Capsule Endoscopy | π | - |
| 2025 | Micinet: Multi-Level Inter-Class Confusing Information Removal For Reliable Multimodal Classification | π | - |
| 2025 | Smoothing The Shift: Towards Stable Test-Time Adaptation Under Complex Multimodal Noises | π | - |
| 2025 | Admn: A Layer-Wise Adaptive Multimodal Network For Dynamic Input Noise And Compute Resources | π | - |
| 2024 | Two-Level Test-Time Adaptation In Multimodal Learning | π | - |
| 2024 | Leveraging Multimodal Features And Item-Level User Feedback For Bundle Construction | π | π» |
| 2024 | Adaflow: Non-Blocking Inference With Heterogeneous Multi-Modal Mobile Sensor Data | π | - |
| 2023 | Redundancy-Adaptive Multimodal Learning For Imperfect Data | π | - |
| 2023 | Calico: Self-Supervised Camera-Lidar Contrastive Pre-Training For Bev Perception | π | - |
| 2022 | Efficient Multimodal Deep-Learning-Based Covid-19 Diagnostic System For Noisy And Corrupted Images | π | - |
| 2020 | M3Er: Multiplicative Multimodal Emotion Recognition Using Facial, Textual, And Speech Cues | π | - |
| 2020 | Seanet: A Multi-Modal Speech Enhancement Network | π | - |
| 2019 | Found In Translation: Learning Robust Joint Representations By Cyclic Translations Between Modalities | π | - |
| 2019 | Learning Representations From Imperfect Time Series Data Via Tensor Rank Regularization | π | - |
| 2019 | Multimodal Representation Learning Using Deep Multiset Canonical Correlation | π | - |
2.2.2 Confidence Estimation Back to Top
| Year | Title | Paper Link | Code Link |
|---|---|---|---|
| 2025 | Deep Learning-Driven Behavioral Modeling In Iost For Mental Health Monitoring And Intervention | π | - |
| 2023 | Calibrating Multimodal Learning | π | - |
| 2023 | Fedmultimodal: A Benchmark For Multimodal Federated Learning | π | - |
| 2023 | Multi-Level Confidence Learning For Trustworthy Multimodal Classification | π | - |
| 2023 | Watch Or Listen: Robust Audio-Visual Speech Recognition With Visual Corruption Modeling And Reliability Scoring | π | π» |
| 2023 | Formnetv2: Multimodal Graph Contrastive Learning For Form Document Information Extraction | π | - |
| 2023 | Sgir: Star Graph-Based Interaction For Efficient And Robust Multimodal Representation | π | - |
| 2023 | Aspnet: Action Segmentation With Shared-Private Representation Of Multiple Data Sources | π | - |
| 2022 | Generalized Product-Of-Experts For Learning Multimodal Representations In Noisy Environments | π | - |
| 2021 | Trustworthy Multimodal Regression With Mixture Of Normal-Inverse Gamma Distributions | π | π» |
| 2021 | Multimodal Attention Fusion For Target Speaker Extraction | π | - |
| 2019 | Anomaly Detection From System Tracing Data Using Multimodal Deep Learning | π | - |
2.2.3 Robust Fusion Back to Top
| Year | Title | Paper Link | Code Link |
|---|---|---|---|
| 2025 | Multi-Task Corrupted Prediction For Learning Robust Audio-Visual Speech Representation | π | - |
| 2024 | Learning Rich Multimodal Representation For Robust Land Cover Classification In Fog | π | - |
| 2024 | Tvdiag: A Task-Oriented And View-Invariant Failure Diagnosis Framework With Multimodal Data | π | - |
| 2024 | Indoor Scene Recognition From Images Under Visual Corruptions | π | - |
| 2023 | Low-Rank Multimanifold Embedding Learning For Multimode Process Monitoring | π | - |
| 2023 | Employing Multimodal Co-Learning To Evaluate The Robustness Of Sensor Fusion For Industry 5.0 Tasks | π | - |
| 2023 | Toward A Robust Sensor Fusion Step For 3D Object Detection On Corrupted Data | π | - |
| 2022 | Progressive Fusion For Multimodal Integration | π | - |
| 2021 | Multibench: Multiscale Benchmarks For Multimodal Representation Learning | π | - |
| 2021 | Vmloc: Variational Fusion For Learning-Based Multimodal Camera Localization | π | π» |
| 2020 | Hgmf: Heterogeneous Graph-Based Fusion For Multimodal Data With Incompleteness | π | - |
| 2020 | Adaptive Multimodal Fusion For Facial Action Units Recognition | π | - |
| 2019 | A Deep Learning Gated Architecture For Ugv Navigation Robust To Sensor Failures | π | - |

Fig 18: High level overview of training strategies for corrupted modality handling
2.3.1 Data Augmentation Back to Top
| Year | Title | Paper Link | Code Link |
|---|---|---|---|
| 2025 | Fusion For Visual-Infrared Person Reid In Real-World Surveillance Using Corrupted Multimodal Data | π | - |
| 2024 | The Effect Of Data Corruption On Multimodal Long Form Responses | π | - |
| 2024 | Benchmarking Large Multimodal Models Against Common Corruptions | π | - |
| 2024 | Robust Visible-Infrared Person Re-Identification Based On Polymorphic Mask And Wavelet Graph Convolutional Network | π | - |
| 2023 | Masking Important Information To Assess The Robustness Of A Multimodal Classifier For Emotion Recognition | π | - |
| 2023 | Multimodal Data Augmentation For Visual-Infrared Person Reid With Corrupted Data | π | - |
| 2023 | Multimodal Synthetic Dataset Balancing: A Framework For Realistic And Balanced Training Data Generation In Industrial Settings | π | - |
| 2023 | Best Of Both Worlds: Multimodal Contrastive Learning With Tabular And Imaging Data | π | - |
| 2019 | Videobert: A Joint Model For Video And Language Representation Learning | π | - |
| 2018 | Deep Audio-Visual Speech Recognition | π | π» |
2.3.2 Adversarial Training Back to Top
| Year | Title | Paper Link | Code Link |
|---|---|---|---|
| 2023 | Cleanclip: Mitigating Data Poisoning Attacks In Multimodal Contrastive Learning | π | π» |
| 2023 | Multi-Head Siamese Prototype Learning Against Both Data And Label Corruption | π | - |
| 2023 | Advclip: Downstream-Agnostic Adversarial Examples In Multimodal Contrastive Learning | π | - |
| 2023 | Contrastive Self-Supervised Learning Leads To Higher Adversarial Susceptibility | π | - |
| 2021 | Robust Multimodal Representation Learning With Evolutionary Adversarial Attention Networks | π | - |
| 2021 | M3P: Learning Universal Representations Via Multitask Multilingual Multimodal Pre-Training | π | - |

Fig 19: High level overview of post-hoc strategies for corrupted modality handling
2.4.1 Error Detection Back to Top
| Year | Title | Paper Link | Code Link |
|---|---|---|---|
| 2025 | Corrupted But Not Broken: Rethinking The Impact Of Corrupted Data In Visual Instruction Tuning | π | - |
| 2025 | Msc-Bench: Benchmarking And Analyzing Multi-Sensor Corruption For Driving Perception | π | - |
| 2024 | Both Text And Images Leaked! A Systematic Analysis Of Multimodal Llm Data Contamination | π | - |
| 2021 | Detect, Reject, Correct: Crossmodal Compensation Of Corrupted Sensors | π | - |
| 2021 | An Immune Inspired Algorithm For Fault Tolerant Enhanced Multimodal Machine Learning | π | - |
| 2021 | Defending Multimodal Fusion Models Against Single-Source Adversaries | π | - |
2.4.2 Recovery Mechanism Back to Top
| Year | Title | Paper Link | Code Link |
|---|---|---|---|
| 2024 | Zeronlg: Aligning And Autoencoding Domains For Zero-Shot Multimodal And Multilingual Natural Language Generation | π | π» |
| 2024 | Dac: 2D-3D Retrieval With Noisy Labels Via Divide-And-Conquer Alignment And Correction | π | - |
| 2023 | Patch: A Plug-In Framework Of Non-Blocking Inference For Distributed Multimodal System | π | - |
| 2023 | Deep Multimodal Fusion With Corrupted Spatio-Temporal Data Using Fuzzy Regularization | π | - |
This repository is licensed under the MIT License - see the LICENSE file for details.
The papers listed in this repository are copyrighted by their respective authors and publishers.
If you find the listing and survey useful for your work, please cite the paper:
@misc{liaqat2025multimodal,
title={Multimodal learning under imperfect data conditions: A Survey},
author={Muhamamd Irzam Liaqat, Qaiser Abbas, Shah Nawaz, Zaigham Zaheer, Marta Moscati, Yufang Hou, Muhammad Haris Khan, Salman Khan, Elisabeth Andre, Markus Schedl}
year={2025},
eprint={},
archivePrefix={arXiv},
primaryClass={cs.CV}
}