ArXiv

Confidence-Guided Diffusion Augmentation for Enhanced Bangla Compound Character Recognition

Authors
Md. Sultan Al Rayhan, Maheen Islam
Categories
cs.CV, cs.AI
arXiv
https://arxiv.org/abs/2605.10916v1
PDF
https://arxiv.org/pdf/2605.10916v1

Brief

Handwritten Bangla compound character recognition is improved via a confidence-guided diffusion augmentation approach that synthesizes class-conditional samples, enhances U-Net blocks with Squeeze-and-Excitation residuals, and filters generated images using pre-trained classifiers. Fusing filtered synthetic images with real data yields consistent gains across ResNet50, DenseNet121, VGG16 and ViT, with a top accuracy of 89.2% on AIBangla. Only the abstract was available for this summary.

Why it matters

Introduces a confidence-guided diffusion augmentation pipeline that uses class-conditional diffusion with classifier guidance, Squeeze-and-Excitation–enhanced residual blocks in the U-Net backbone, and a classifier-based confidence filter to keep only high-quality, class-consistent synthetic Bangla compound character samples.

Key details

  • On the AIBangla compound character dataset, augmented training improves multiple classifiers (ResNet50, DenseNet121, VGG16, Vision Transformer); the best model achieves 89.2% accuracy. Paper posted 2026-05-11 and reports outperforming the previously published AIBangla benchmark by a substantial margin.
Source evidence

Abstract

Recognition of handwritten Bangla compound characters remains a challenging problem due to complex character structures, large intra-class variation, and limited availability of high-quality annotated data. Existing Bangla handwritten character recognition systems often struggle to generalize across diverse writing styles, particularly for compound characters containing intricate ligatures and diacritical variations. In this work, we propose a confidence-guided diffusion augmentation framework for low-resolution Bangla compound character recognition. Our framework combines class-conditional diffusion modeling with classifier guidance to synthesize high-quality handwritten compound character samples. To further improve generation quality, we introduce Squeeze-and-Excitation enhanced residual blocks within the diffusion model's U-Net backbone. We additionally propose a confidence-based filtering mechanism where pre-trained classifiers act as quality gates to retain only highly class-consistent synthetic samples. The filtered synthetic images are fused with the original training data and used to retrain multiple classification architectures. Experiments conducted on the AIBangla compound character dataset demonstrate consistent performance improvements across ResNet50, DenseNet121, VGG16, and Vision Transformer architectures. Our best-performing model achieves 89.2\% classification accuracy, surpassing the previously published AIBangla benchmark by a substantial margin. The results demonstrate that quality-aware diffusion augmentation can effectively enhance handwritten character recognition performance in low-resource script domains.