Skip to content

qaixerabbas/awesome-multimodal-learning-with-imperfect-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

32 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Latest Awesome License

Multimodal Learning Under Imperfect Data Conditions: A Survey

Authors: Muhamamd Irzam Liaqat, Qaiser Abbas, Shah Nawaz, Muhammad Zaigham Zaheer, Marta Moscati, Yufang Hou, Muhammad Haris Khan, Salman Khan, Elisabeth Andre, Markus Schedl

PDF Preview
Fig 1: Abstract Overview of different type of learning.(a). Unimodal Learning (b). Multimodal Learning (c). Multimodal Learning under Missing Modalities (d). Multimodal Learning under CorruptedΒ Modalities

PDF Preview
Fig 2: Use-cases of data corruptions in real world. (a) Multimodal Learning (b) Multimodal Learning with missing modalities (c) Multimodal Learning with Corrupted Modalities

We strongly encourage the contributors/researchers to contribute to the research community in this specific research area. To add latest papers just make pull request to update the new paper's information!

Table of Contents

Existing Survey Paper Back to Top

Year Title Paper Link Code Link
2024 Deep multimodal learning with missing modality: A survey πŸ“„ Link -
2024 Multimodal fusion on low-quality data: A comprehensive survey πŸ“„ Link -
2023 Multimodal learning with transformers: A survey πŸ“„ Link -
2022 A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets πŸ“„ Link -
2020 A survey on deep learning for multimodal data fusion πŸ“„ Link -
2019 Deep multimodal representation learning: A survey πŸ“„ Link -
2018 Multimodal machine learning: A survey and taxonomy πŸ“„ Link -

Multimodal Learning with Missing Modalities Back to Top

Taxonomy for missing modalities
Fig 3: Overview of our missing modality taxonomy with SOTA methods Modalities

Overview of missing modalities
Fig 4: Overview of the existing studies on multimodal learning under missing modalities, showing (a) yearly publication trends, (b) application areas, (c) modality distribution, and (d) publication venues

1.1 Reconstruction

1.1.1 Generative Back To Top

generative
Fig 5: High level overview of generative methods for missing modality handling

Year Title Paper Link Code Link
2025 Knowledge Bridger: Towards Training-Free Missing Multi-Modality Completion πŸ“„ πŸ’»
2025 Multimodal Cascaded Framework With Multimodal Latent Loss Functions Robust To Missing Modalities πŸ“„ -
2025 Sdr-Gnn: Spectral Domain Reconstruction Graph Neural Network For Incomplete Multimodal Learning In Conversational Emotion Recognition πŸ“„ -
2025 Amm-Diff: Adaptive Multi-Modality Diffusion Network For Missing Modality Imputation πŸ“„ -
2024 Fmcnet $+ $: Feature-Level Modality Compensation For Visible-Infrared Person Re-Identification πŸ“„ -
2024 Unified Multi-Modal Image Synthesis For Missing Modality Imputation πŸ“„ -
2024 Deformation-Aware And Reconstruction-Driven Multimodal Representation Learning For Brain Tumor Segmentation With Missing Modalities πŸ“„ πŸ’»
2024 Do We Really Need To Drop Items With Missing Modalities In Multimodal Recommendation? πŸ“„ -
2023 Unimf: A Unified Multimodal Framework For Multimodal Sentiment Analysis In Missing Modalities And Unaligned Multimodal Sequences πŸ“„ πŸ’»
2023 Learning Unified Hyper-Network For Multi-Modal Mr Image Synthesis And Tumor Segmentation With Missing Modalities πŸ“„ πŸ’»
2023 Exploiting Modality-Invariant Feature For Robust Multimodal Emotion Recognition With Missing Modalities πŸ“„ -
2022 M2R2: Missing-Modality Robust Emotion Recognition Framework With Iterative Data Augmentation πŸ“„ -
2022 Region-Of-Interest Attentive Heteromodal Variational Encoder-Decoder For Segmentation With Missing Modalities πŸ“„ πŸ’»
2022 Fmcnet: Feature-Level Modality Compensation For Visible-Infrared Person Re-Identification πŸ“„ -
2021 Semi-Supervised Multimodal Image Translation For Missing Modality Imputation πŸ“„ -
2021 Brain Tumor Segmentation For Missing Modalities By Supplementing Missing Features πŸ“„ -
2021 Feature-Enhanced Generation And Multi-Modality Fusion Based Deep Neural Network For Brain Tumor Segmentation With Missing Mr Modalities πŸ“„ -
2021 Glioblastoma Multiforme Prognosis: Mri Missing Modality Generation, Segmentation And Radiogenomic Survival Prediction πŸ“„ -
2021 Missing Modality Imagination Network For Emotion Recognition With Uncertain Missing Modalities πŸ“„ -
2020 Optimal Sparse Linear Prediction For Block-Missing Multi-Modality Data Without Imputation πŸ“„ -
2020 Estimation Of Missing Values In Heterogeneous Traffic Data: Application Of Multimodal Deep Learning Model πŸ“„ -
2018 Synthesizing And Reconstructing Missing Sensory Modalities In Behavioral Context Recognition πŸ“„ -
2018 Deep Adversarial Learning For Multi-Modality Missing Data Completion πŸ“„ -

1.1.2 Alignment Back to Top

alignment
Fig 6: High level overview of alignment methods for missing modality handling

Year Title Paper Link Code Link
2025 Robust Multimodal Learning Via Cross-Modal Proxy Tokens πŸ“„ -
2025 Wasserstein Modality Alignment Makes Your Multimodal Transformer More Robust πŸ“„ -
2024 Multimodal Knowledge Graph Embedding With Missing Data Integration πŸ“„ -
2024 Penta-Encoder With Medical Transformer For Incomplete Multimodal Learning Of Brain Tumor Segmentation πŸ“„ -
2023 Rethinking Missing Modality Learning From A Decoding Perspective πŸ“„ -
2023 Exploiting Multi-Modal Fusion For Robust Face Representation Learning With Missing Modality πŸ“„ -
2023 Multimodal Language Learning For Object Retrieval In Low Data Regimes In The Face Of Missing Modalities πŸ“„ -
2023 Cross-Modal Alignment And Translation For Missing Modality Action Recognition πŸ“„ -
2022 Mm-Align: Learning Optimal Transport-Based Alignment Dynamics For Fast And Accurate Inference On Missing Modality Sequences πŸ“„ πŸ’»
2022 M3Care: Learning With Missing Modalities In Multimodal Healthcare Data πŸ“„ -
2022 A General Framework For Incomplete Cross-Modal Retrieval With Missing Labels And Missing Modalities πŸ“„ -
2021 A Non-Linear Mapping Representing Human Action Recognition Under Missing Modality Problem In Video Data πŸ“„ -
2018 Generalized Bayesian Canonical Correlation Analysis With Missing Modalities πŸ“„ -

1.2 Architectural

1.2.1 Model Design Back to Top

model
Fig 7: High level overview of model based methods for missing modality handling

Year Title Paper Link Code Link
2025 Uml: A Unified Multimodal Learning Framework For Cataract Postoperative Visual Acuity Prediction With Uncertain Missing Modalities πŸ“„ πŸ’»
2024 Missing Modality Robustness In Semi-Supervised Multi-Modal Semantic Segmentation πŸ“„ πŸ’»
2024 Mmmvit: Multiscale Multimodal Vision Transformer For Brain Tumor Segmentation With Missing Modalities πŸ“„ πŸ’»
2024 Robust Multimodal Learning With Missing Modalities Via Parameter-Efficient Adaptation πŸ“„ -
2024 Unibev: Multi-Modal 3D Object Detection With Uniform Bev Encoders For Robustness Against Missing Sensor Modalities πŸ“„ -
2023 Towards Good Practices For Missing Modality Robust Action Recognition πŸ“„ -
2023 M3Ae: Multimodal Representation Learning For Brain Tumor Segmentation With Missing Modalities πŸ“„ -
2023 Multi-Modal Learning With Missing Modality Via Shared-Specific Feature Modelling πŸ“„ -
2022 Smu-Net: Style Matching U-Net For Brain Tumor Segmentation With Missing Modalities πŸ“„ πŸ’»
2022 Moddrop++: A Dynamic Filter Network With Intra-Subject Co-Training For Multiple Sclerosis Lesion Segmentation With Missing Modalities πŸ“„ -
2022 Mmformer: Multimodal Medical Transformer For Incomplete Multimodal Learning Of Brain Tumor Segmentation πŸ“„ πŸ’»
2021 Maximum Likelihood Estimation For Multimodal Learning With Missing Modality πŸ“„ -
2020 Training Strategies To Handle Missing Modalities For Audio-Visual Expression Recognition πŸ“„ -
2020 Multimodal Biometrics Recognition From Facial Video With Missing Modalities Using Deep Learning πŸ“„ -
2019 A Unified Representation Network For Segmentation With Missing Modalities πŸ“„ -
2019 Audio Feature Generation For Missing Modality Problem In Video Action Recognition πŸ“„ -
2019 Brain Tumor Segmentation On Mri With Missing Modalities πŸ“„ -

1.2.2 Selective Fusion Back to Top

fusion
Fig 8: High level overview of fusion based methods for missing modality handling

Year Title Paper Link Code Link
2023 What Makes For Robust Multi-Modal Models In The Face Of Missing Modalities? πŸ“„ -
2023 Rethinking Uncertainly Missing And Ambiguous Visual Modality In Multi-Modal Entity Alignment πŸ“„ πŸ’»
2022 Mitigating Inconsistencies In Multimodal Sentiment Analysis Under Uncertain Missing Modalities πŸ“„ πŸ’»
2021 Robust Multi-Modality Person Re-Identification πŸ“„ -

1.2.3 Co-Learning Back to Top

colearning
Fig 9: High level overview of co-learning methods for missing modality handling

Year Title Paper Link Code Link
2023 Multimodal Federated Learning With Missing Modality Via Prototype Mask And Contrast πŸ“„ πŸ’»
2023 Enhancing Modality-Agnostic Representations Via Meta-Learning For Brain Tumor Segmentation πŸ“„ -
2023 Missmodal: Increasing Robustness To Missing Modality In Multimodal Sentiment Analysis πŸ“„ πŸ’»
2023 Multimodal Reconstruct And Align Net For Missing Modality Problem In Sentiment Analysis πŸ“„ -
2022 Missing Modality Meets Meta Sampling (M3S): An Efficient Universal Approach For Multimodal Sentiment Analysis With Missing Modality πŸ“„ -
2022 D 2-Net: Dual Disentanglement Network For Brain Tumor Segmentation With Missing Modalities πŸ“„ πŸ’»
2021 An Efficient Approach For Audio-Visual Emotion Recognition With Missing Labels And Missing Modalities πŸ“„ -
2021 Smil: Multimodal Learning With Severely Missing Modality πŸ“„ -
2021 Deep Multisensor Learning For Missing-Modality All-Weather Mapping πŸ“„ -
2021 Progressive Modality Cooperation For Multi-Modality Domain Adaptation πŸ“„ -
2021 Acn: Adversarial Co-Training Network For Brain Tumor Segmentation With Missing Modalities πŸ“„ -
2018 Lrmm: Learning To Recommend With Missing Modalities πŸ“„ -

1.2.4 Distillation Back to Top

distillation
Fig 10: High level overview of distillation methods for missing modality handling

Year Title Paper Link Code Link
2025 Modalitymirror: Enhancing Audio Classification In Modality Heterogeneity Federated Learning Via Multimodal Distillation πŸ“„ -
2025 Modality-Invariant Bidirectional Temporal Representation Distillation Network For Missing Multimodal Sentiment Analysis πŸ“„ -
2025 Test-Time Adaptation For Combating Missing Modalities In Egocentric Videos πŸ“„ -
2024 Segment Beyond View: Handling Partially Missing Modality For Audio-Visual Semantic Segmentation πŸ“„ -
2023 Prototype Knowledge Distillation For Medical Segmentation With Missing Modality πŸ“„ πŸ’»
2023 Msh-Net: Modality-Shared Hallucination With Joint Adaptation Distillation For Remote Sensing Image Classification Using Missing Modalities πŸ“„ πŸ’»
2023 Learnable Cross-Modal Knowledge Distillation For Multi-Modal Learning With Missing Modality πŸ“„ -
2023 Multi-Head Siamese Prototype Learning Against Both Data And Label Corruption πŸ“„ -
2021 Dealing With Missing Modalities In The Visual Question Answer-Difference Prediction Task Through Knowledge Distillation πŸ“„ -
2020 Multimodal Learning With Incomplete Modalities By Knowledge Distillation πŸ“„ -
2019 An Adversarial Approach To Discriminative Modality Distillation For Remote Sensing Image Classification πŸ“„ -
2019 Cross-Modal Learning By Hallucinating Missing Modalities In Rgb-D Vision πŸ“„ πŸ’»
2018 Modality Distillation With Multiple Stream Networks For Action Recognition πŸ“„ -

1.2.5 Attention Mechanism Back to Top

attention
Fig 11: High level overview of attention methods for missing modality handling

Year Title Paper Link Code Link
2024 Mman-M2: Multiple Multi-Head Attentions Network Based On Encoder With Missing Modalities πŸ“„ -
2024 Framm: Fair Ranking With Missing Modalities For Clinical Trial Site Selection πŸ“„ -
2023 Accommodating Missing Modalities In Time-Continuous Multimodal Emotion Recognition πŸ“„ -
2023 Attention-Based Multimodal Fusion With Contrast For Robust Clinical Prediction In The Face Of Missing Modalities πŸ“„ -
2023 Magnet: Modality-Agnostic Network For Brain Tumor Segmentation And Characterization With Missing Modalities πŸ“„ -
2023 Contrastive Learning-Based Spectral Knowledge Distillation For Multi-Modality And Missing Modality Scenarios In Semantic Segmentation πŸ“„ -
2023 Audio-Visual Sensor Fusion Framework Using Person Attributes Robust To Missing Visual Modality For Person Recognition πŸ“„ -
2022 Tag-Assisted Multimodal Sentiment Analysis Under Uncertain Missing Modalities πŸ“„ πŸ’»
2022 A Multimodal Sensor Fusion Framework Robust To Missing Modalities For Person Recognition πŸ“„ -
2022 Multi-Modal Brain Tumor Segmentation Via Missing Modality Synthesis And Modality-Level Attention Fusion πŸ“„ -
2022 Robust Multimodal Sentiment Analysis Via Tag Encoding Of Uncertain Missing Modalities πŸ“„ -
2022 Multimodal Image Aesthetic Prediction With Missing Modality πŸ“„ -
2022 Modality-Adaptive Feature Interaction For Brain Tumor Segmentation With Missing Modalities πŸ“„ -
2021 Multimodal Gait Recognition Under Missing Modalities πŸ“„ πŸ’»
2020 Multi-Modality Matters: A Performance Leap On Voxceleb. πŸ“„ πŸ’»
2020 Brain Tumor Segmentation With Missing Modalities Via Latent Multi-Source Correlation Representation πŸ“„ -

1.2.6 Prompt Learning Back to Top

Prompt Learning
Fig 12: High level overview of prompt learning methods for missing modality handling

Year Title Paper Link Code Link
2025 Retrieval-Augmented Dynamic Prompt Tuning For Incomplete Multimodal Learning πŸ“„ -
2025 Efficient Prompting For Continual Adaptation To Missing Modalities πŸ“„ -
2025 Multimodal Invariant Feature Prompt Network For Brain Tumor Segmentation With Missing Modalities πŸ“„ πŸ’»
2025 Pal: Prompting Analytic Learning With Missing Modality For Multi-Modal Class-Incremental Learning πŸ“„ -
2025 Semantically Conditioned Prompts For Visual Recognition Under Missing Modality Scenarios πŸ“„ πŸ’»
2024 Towards Robust Multimodal Prompting With Missing Modalities πŸ“„ -
2023 Multimodal Prompting With Missing Modalities For Visual Recognition πŸ“„ πŸ’»

1.3 Hybrid Appraoches Back to Top

Hybrid Learning
Fig 13: High level overview of Hybrid methods for missing modality handling

Year Title Paper Link Code Link
2025 Cross-Modal Prototype Based Multimodal Federated Learning Under Severely Missing Modality πŸ“„ -
2025 Graph Attention Contrastive Learning With Missing Modality For Multimodal Recommendation πŸ“„ -
2025 Fedmobile: Enabling Knowledge Contribution-Aware Multi-Modal Federated Learning With Incomplete Modalities πŸ“„ -
2025 Incomplete Modality Disentangled Representation For Ophthalmic Disease Grading And Diagnosis πŸ“„ πŸ’»
2025 Ssfd-Net: Shared-Specific Feature Disentanglement Network For Multimodal Biometric Recognition With Missing Modality πŸ“„ -
2025 Diffusion-Driven Incomplete Multimodal Learning For Air Quality Prediction πŸ“„ -
2025 Ogp-Net: Optical Guidance Meets Pixel-Level Contrastive Distillation For Robust Multi-Modal And Missing Modality Segmentation πŸ“„ -
2025 Multimodal Sentiment Analysis Based On Multi-Stage Graph Fusion Networks Under Random Missing Modality Conditions πŸ“„ -
2025 Text-Guided Reconstruction Network For Sentiment Analysis With Uncertain Missing Modalities πŸ“„ -
2025 Open-Modality Latent Modality Interaction Maximization For Audio-Visual Learning πŸ“„ -
2025 Disentangling And Generating Modalities For Recommendation In Missing Modality Scenarios πŸ“„ -
2025 Optimus: Predicting Multivariate Outcomes In Alzheimer'S Disease Using Multi-Modal Data Amidst Missing Values πŸ“„ -
2025 Tackling Real-World Complexity: Hierarchical Modeling And Dynamic Prompting For Multimodal Long Document Classification πŸ“„ -
2025 Mi-Cga: Cross-Modal Graph Attention Network For Robust Emotion Recognition In The Presence Of Incomplete Modalities πŸ“„ πŸ’»
2025 Emotional Boundaries And Intensity Aware Model For Incomplete Multimodal Sentiment Analysis πŸ“„ -
2025 Adaptive Cross-Modal Representation Learning For Heterogeneous Data Types In Alzheimer Disease Progression Prediction With Missing Time Point And Modalities πŸ“„ -
2024 Modality Translation-Based Multimodal Sentiment Analysis Under Uncertain Missing Modalities πŸ“„ -
2024 Tip: Tabular-Image Pre-Training For Multimodal Classification With Incomplete Data πŸ“„ πŸ’»
2023 Feature Fusion And Latent Feature Learning Guided Brain Tumor Segmentation And Missing Modality Recovery Network πŸ“„ -
2022 Are Multimodal Transformers Robust To Missing Modality? πŸ“„ -
2021 Ugaitnet: Multimodal Gait Recognition With Missing Input Modalities πŸ“„ πŸ’»
2018 Semi-Supervised Deep Generative Modelling Of Incomplete Multi-Modality Emotional Data πŸ“„ -
2018 Urban Land Cover Classification With Missing Data Modalities Using Deep Convolutional Neural Networks πŸ“„ -

Multimodal Learning with Corrupted Modalities

Taxonomy for corrupted modalities
Fig 14: Overview of our corrupted modality taxonomy with SOTA methods Modalities

Taxonomy for corrupted modalities
Fig 15: Overview of the existing studies on multimodal learning under corrupted modalities, showing (a) yearly publication trends, (b) application areas, (c) modality distribution, and (d) publication venues

2.1 Data Processing Methods

2.1.1 Denoising Methods Back to Top

Hybrid Learning
Fig 16: High level overview of Hybrid methods for corrupted modality handling

Year Title Paper Link Code Link
2024 Centaur: Robust Multimodal Fusion For Human Activity Recognition πŸ“„ -
2023 Rhvit: A Robust Hierarchical Transformer For 3D Multimodal Brain Tumor Segmentation Using Biased Masked Image Modeling Pre-Training πŸ“„ -
2022 Multimodal Cloud Resources Utilization Forecasting Using A Bidirectional Gated Recurrent Unit Predictor Based On A Power Efficient Stacked Denoising Autoencoders πŸ“„ -
2018 Highly Accurate Image Reconstruction For Multimodal Noise Suppression Using Semisupervised Learning On Big Data πŸ“„ πŸ’»

2.2 Architectural Methods

Architectural Learning
Fig 17: High level overview of architectural methods for corrupted modality handling

2.2.1 Noise Aware Networks Back to Top

Year Title Paper Link Code Link
2025 V 2-Sfmlearner: Learning Monocular Depth And Ego-Motion For Multimodal Wireless Capsule Endoscopy πŸ“„ -
2025 Micinet: Multi-Level Inter-Class Confusing Information Removal For Reliable Multimodal Classification πŸ“„ -
2025 Smoothing The Shift: Towards Stable Test-Time Adaptation Under Complex Multimodal Noises πŸ“„ -
2025 Admn: A Layer-Wise Adaptive Multimodal Network For Dynamic Input Noise And Compute Resources πŸ“„ -
2024 Two-Level Test-Time Adaptation In Multimodal Learning πŸ“„ -
2024 Leveraging Multimodal Features And Item-Level User Feedback For Bundle Construction πŸ“„ πŸ’»
2024 Adaflow: Non-Blocking Inference With Heterogeneous Multi-Modal Mobile Sensor Data πŸ“„ -
2023 Redundancy-Adaptive Multimodal Learning For Imperfect Data πŸ“„ -
2023 Calico: Self-Supervised Camera-Lidar Contrastive Pre-Training For Bev Perception πŸ“„ -
2022 Efficient Multimodal Deep-Learning-Based Covid-19 Diagnostic System For Noisy And Corrupted Images πŸ“„ -
2020 M3Er: Multiplicative Multimodal Emotion Recognition Using Facial, Textual, And Speech Cues πŸ“„ -
2020 Seanet: A Multi-Modal Speech Enhancement Network πŸ“„ -
2019 Found In Translation: Learning Robust Joint Representations By Cyclic Translations Between Modalities πŸ“„ -
2019 Learning Representations From Imperfect Time Series Data Via Tensor Rank Regularization πŸ“„ -
2019 Multimodal Representation Learning Using Deep Multiset Canonical Correlation πŸ“„ -

2.2.2 Confidence Estimation Back to Top

Year Title Paper Link Code Link
2025 Deep Learning-Driven Behavioral Modeling In Iost For Mental Health Monitoring And Intervention πŸ“„ -
2023 Calibrating Multimodal Learning πŸ“„ -
2023 Fedmultimodal: A Benchmark For Multimodal Federated Learning πŸ“„ -
2023 Multi-Level Confidence Learning For Trustworthy Multimodal Classification πŸ“„ -
2023 Watch Or Listen: Robust Audio-Visual Speech Recognition With Visual Corruption Modeling And Reliability Scoring πŸ“„ πŸ’»
2023 Formnetv2: Multimodal Graph Contrastive Learning For Form Document Information Extraction πŸ“„ -
2023 Sgir: Star Graph-Based Interaction For Efficient And Robust Multimodal Representation πŸ“„ -
2023 Aspnet: Action Segmentation With Shared-Private Representation Of Multiple Data Sources πŸ“„ -
2022 Generalized Product-Of-Experts For Learning Multimodal Representations In Noisy Environments πŸ“„ -
2021 Trustworthy Multimodal Regression With Mixture Of Normal-Inverse Gamma Distributions πŸ“„ πŸ’»
2021 Multimodal Attention Fusion For Target Speaker Extraction πŸ“„ -
2019 Anomaly Detection From System Tracing Data Using Multimodal Deep Learning πŸ“„ -

2.2.3 Robust Fusion Back to Top

Year Title Paper Link Code Link
2025 Multi-Task Corrupted Prediction For Learning Robust Audio-Visual Speech Representation πŸ“„ -
2024 Learning Rich Multimodal Representation For Robust Land Cover Classification In Fog πŸ“„ -
2024 Tvdiag: A Task-Oriented And View-Invariant Failure Diagnosis Framework With Multimodal Data πŸ“„ -
2024 Indoor Scene Recognition From Images Under Visual Corruptions πŸ“„ -
2023 Low-Rank Multimanifold Embedding Learning For Multimode Process Monitoring πŸ“„ -
2023 Employing Multimodal Co-Learning To Evaluate The Robustness Of Sensor Fusion For Industry 5.0 Tasks πŸ“„ -
2023 Toward A Robust Sensor Fusion Step For 3D Object Detection On Corrupted Data πŸ“„ -
2022 Progressive Fusion For Multimodal Integration πŸ“„ -
2021 Multibench: Multiscale Benchmarks For Multimodal Representation Learning πŸ“„ -
2021 Vmloc: Variational Fusion For Learning-Based Multimodal Camera Localization πŸ“„ πŸ’»
2020 Hgmf: Heterogeneous Graph-Based Fusion For Multimodal Data With Incompleteness πŸ“„ -
2020 Adaptive Multimodal Fusion For Facial Action Units Recognition πŸ“„ -
2019 A Deep Learning Gated Architecture For Ugv Navigation Robust To Sensor Failures πŸ“„ -

2.3 Training Strategies

Adversarial Learning
Fig 18: High level overview of training strategies for corrupted modality handling

2.3.1 Data Augmentation Back to Top

Year Title Paper Link Code Link
2025 Fusion For Visual-Infrared Person Reid In Real-World Surveillance Using Corrupted Multimodal Data πŸ“„ -
2024 The Effect Of Data Corruption On Multimodal Long Form Responses πŸ“„ -
2024 Benchmarking Large Multimodal Models Against Common Corruptions πŸ“„ -
2024 Robust Visible-Infrared Person Re-Identification Based On Polymorphic Mask And Wavelet Graph Convolutional Network πŸ“„ -
2023 Masking Important Information To Assess The Robustness Of A Multimodal Classifier For Emotion Recognition πŸ“„ -
2023 Multimodal Data Augmentation For Visual-Infrared Person Reid With Corrupted Data πŸ“„ -
2023 Multimodal Synthetic Dataset Balancing: A Framework For Realistic And Balanced Training Data Generation In Industrial Settings πŸ“„ -
2023 Best Of Both Worlds: Multimodal Contrastive Learning With Tabular And Imaging Data πŸ“„ -
2019 Videobert: A Joint Model For Video And Language Representation Learning πŸ“„ -
2018 Deep Audio-Visual Speech Recognition πŸ“„ πŸ’»

2.3.2 Adversarial Training Back to Top

Year Title Paper Link Code Link
2023 Cleanclip: Mitigating Data Poisoning Attacks In Multimodal Contrastive Learning πŸ“„ πŸ’»
2023 Multi-Head Siamese Prototype Learning Against Both Data And Label Corruption πŸ“„ -
2023 Advclip: Downstream-Agnostic Adversarial Examples In Multimodal Contrastive Learning πŸ“„ -
2023 Contrastive Self-Supervised Learning Leads To Higher Adversarial Susceptibility πŸ“„ -
2021 Robust Multimodal Representation Learning With Evolutionary Adversarial Attention Networks πŸ“„ -
2021 M3P: Learning Universal Representations Via Multitask Multilingual Multimodal Pre-Training πŸ“„ -

2.4 Post Hoc Methods

Post Hoc Learning
Fig 19: High level overview of post-hoc strategies for corrupted modality handling

2.4.1 Error Detection Back to Top

Year Title Paper Link Code Link
2025 Corrupted But Not Broken: Rethinking The Impact Of Corrupted Data In Visual Instruction Tuning πŸ“„ -
2025 Msc-Bench: Benchmarking And Analyzing Multi-Sensor Corruption For Driving Perception πŸ“„ -
2024 Both Text And Images Leaked! A Systematic Analysis Of Multimodal Llm Data Contamination πŸ“„ -
2021 Detect, Reject, Correct: Crossmodal Compensation Of Corrupted Sensors πŸ“„ -
2021 An Immune Inspired Algorithm For Fault Tolerant Enhanced Multimodal Machine Learning πŸ“„ -
2021 Defending Multimodal Fusion Models Against Single-Source Adversaries πŸ“„ -

2.4.2 Recovery Mechanism Back to Top

Year Title Paper Link Code Link
2024 Zeronlg: Aligning And Autoencoding Domains For Zero-Shot Multimodal And Multilingual Natural Language Generation πŸ“„ πŸ’»
2024 Dac: 2D-3D Retrieval With Noisy Labels Via Divide-And-Conquer Alignment And Correction πŸ“„ -
2023 Patch: A Plug-In Framework Of Non-Blocking Inference For Distributed Multimodal System πŸ“„ -
2023 Deep Multimodal Fusion With Corrupted Spatio-Temporal Data Using Fuzzy Regularization πŸ“„ -

License

This repository is licensed under the MIT License - see the LICENSE file for details.

The papers listed in this repository are copyrighted by their respective authors and publishers.

Citation

If you find the listing and survey useful for your work, please cite the paper:

@misc{liaqat2025multimodal,
      title={Multimodal learning under imperfect data conditions: A Survey}, 
      author={Muhamamd Irzam Liaqat, Qaiser Abbas, Shah Nawaz, Zaigham Zaheer, Marta Moscati, Yufang Hou, Muhammad Haris Khan, Salman Khan, Elisabeth Andre, Markus Schedl}
      year={2025},
      eprint={},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}