In the real world, a limited number of labeled finely grained images per class can hardly represent the class distribution effectively. Due to the more subtle visual differences in fine-grained images than simple images with obvious objects, that is, there exist smaller interclass and larger intraclass variations. To solve these issues, we propose an end-to-end attention-based model for fine-grained few-shot image classification (AFG) with the recent episode training strategy. It is composed mainly of a feature learning module, an image reconstruction module, and a label distribution module. The feature learning module mainly devises a 3D-Attention mechanism, which considers both the spatial positions and different channel attentions of the image features, in order to learn more discriminative local features to better represent the class distribution. The image reconstruction module calculates the mappings between local features and the original images. It is constrained by a designed loss function as auxiliary supervised information, so that the learning of each local feature does not need extra annotations. The label distribution module is used to predict the label distribution of a given unlabeled sample, and we use the local features to represent the image features for classification. By conducting comprehensive experiments on Mini-ImageNet and three fine-grained data sets, we demonstrate that the proposed model achieves superior performance over the competitors.

You do not currently have access to this content.