Skip to main content

Weighing features of lung and heart regions for thoracic disease classification

Abstract

Background

Chest X-rays are the most commonly available and affordable radiological examination for screening thoracic diseases. According to the domain knowledge of screening chest X-rays, the pathological information usually lay on the lung and heart regions. However, it is costly to acquire region-level annotation in practice, and model training mainly relies on image-level class labels in a weakly supervised manner, which is highly challenging for computer-aided chest X-ray screening. To address this issue, some methods have been proposed recently to identify local regions containing pathological information, which is vital for thoracic disease classification. Inspired by this, we propose a novel deep learning framework to explore discriminative information from lung and heart regions.

Result

We design a feature extractor equipped with a multi-scale attention module to learn global attention maps from global images. To exploit disease-specific cues effectively, we locate lung and heart regions containing pathological information by a well-trained pixel-wise segmentation model to generate binarization masks. By introducing element-wise logical AND operator on the learned global attention maps and the binarization masks, we obtain local attention maps in which pixels are are 1 for lung and heart region and 0 for other regions. By zeroing features of non-lung and heart regions in attention maps, we can effectively exploit their disease-specific cues in lung and heart regions. Compared to existing methods fusing global and local features, we adopt feature weighting to avoid weakening visual cues unique to lung and heart regions. Our method with pixel-wise segmentation can help overcome the deviation of locating local regions. Evaluated by the benchmark split on the publicly available chest X-ray14 dataset, the comprehensive experiments show that our method achieves superior performance compared to the state-of-the-art methods.

Conclusion

We propose a novel deep framework for the multi-label classification of thoracic diseases in chest X-ray images. The proposed network aims to effectively exploit pathological regions containing the main cues for chest X-ray screening. Our proposed network has been used in clinic screening to assist the radiologists. Chest X-ray accounts for a significant proportion of radiological examinations. It is valuable to explore more methods for improving performance.

Peer Review reports

Background

Chest X-ray imaging is one of the most commonly available and affordable radiological examinations for screening and clinical diagnosis. In clinical practice, diagnosing the chest X-ray images is heavily dependent on radiologists’ expertise with at least years of professional experience. And this process is time-consuming and prone to subjective assessment errors [1]. Hence, it is strongly desired to develop a computer-aided diagnosis system to support clinical practitioners. Many existing works using deep learning have been proposed to automatically diagnose thoracic diseases for chest X-ray images in recent years and achieve remarkable progress, such as disease classification [2, 3], abnormality detection [4, 5], chest X-ray segmentation [6, 7], disease prediction [8, 9]. Among various computer-aided diagnosis tasks for chest X-ray images, our work aims to address the disease classification task. The classification task is highly challenging for computer-aided screening due to the low resolution and poor specificity of chest X-ray images.

Fig. 1
figure 1

Examples demonstrating pathological regions of eight thoracic diseases of chest X-rays. Predicted bounding boxes by our method are shown in blue and ground truth in red

Early works using convolutional neural networks (CNN) [10,11,12] for thoracic disease classification of chest X-ray images typically employ the global image for model training. However, the global learning strategy may suffer from the affection of normal regions. As shown in Fig. 1, each image contains two parts: pathological regions (red bounding box) and normal regions. The pathological regions are the main cues for screening chest X-ray, and its cues may be drowned in the global image during model learning due to the affection of normal regions. For example, the nodule occupies a small area, and its visual cues are difficult to be reserved in the ultimate features due to a large number of convolution layers that reduce the detail characteristics. Considering this fact, it is vital to enhance the visual features of pathological regions and suppress the disturbing of normal regions during model training. However, although several large chest X-ray datasets [13,14,15] have been published, region-level annotations are still scarce and expensive to acquire. With image-level annotations (class labels), some strategies related to pathological region locating and learning have been explored in many existing methods [16, 17].

The performance of region learning heavily relies on the accuracy of locating pathological regions with class labels. Some existing methods have been proposed to locate pathological regions for thoracic disease classification in chest X-rays, such as region proposals [18, 19], saliency maps [18, 19]. However, without region-level annotations, they cannot precisely identify pathological regions by predicting bounding box, as shown in the blue rectangle of nodule image of Fig. 1. According to the report of existing works [17] on the chest X-ray14 dataset [20], the best performance of predicting bounding box is 0.29 average intersection over union (IoU) and 0.37 average continuous Dice. To avoid the deviation of locating pathological regions, some works [16, 21] proposed the deep fusion network by integrating the global features to compensate the lost discriminative cures of local features. However, the fusion methods must be careful tuned to avoid the local features smoothing out in the global features. The local features have learned pathological information, but its differentiating role will be weakened on the fusion process. Considering the above issues, our work designs a novel deep learning framework to explore discriminative information from local regions and enhance the differentiating role of local regions for thoracic disease classification.

By observing the area of pathological regions in Fig. 1, the domain knowledge that pathological regions of thoracic diseases are typically limited within the lung and heart can be asserted. Inspired by this prior knowledge, we can locate lung and heart regions by pixel-wise segmentation. Although the lung and heart regions still contain non-pathological regions that occupy large areas, these areas are smaller than the entire image and effectively cover pathological information. In fact, our method makes a trade-off between suppressing normal regions and identifying pathological regions accurately. Based on the global attention maps, the local features of the lung and heart regions are uniquely used for class-probability prediction by applying pixel-wise segmentation. Without region-level annotations, it is difficult to locate pathological regions accurately; our solution is to make the most efforts to narrow the regions containing pathological information. The main contributions of this work are summarized as follows:

  1. 1.

    To effectively learn the discriminative information from pathological regions and avoid the affection of normal regions, we propose a novel deep learning framework for thoracic diseases classification in chest X-ray. The proposed framework combines a feature extractor equipped with a multi-scale attention module and a well-trained pixel-level segmentation model for the lung and heart regions.

  2. 2.

    The multi-scale attention module learns the discriminative information from chest X-ray images to generate global attention maps. We apply a feature weighting strategy for the lung and heart regions containing pathological information to exploit their disease-specific cues effectively.

  3. 3.

    Evaluated by the benchmark split on the publicly available chest X-ray14 dataset, the comprehensive experiments show that our method can achieve the best performance compared to the state-of-the-art methods. The multi-scale attention module can be embedded into any off-the-shelf networks to help promote the classification performance.

Related works

Chest X-ray datasets. Chest X-ray imaging is one of the most widely available modalities to assess thoracic diseases. And for a long time, the task of computer-aided screening for chest X-ray images has been extensively explored in the field of medical image analysis. Several released hospital-scale chest X-ray datasets greatly foster multi-label classification research of thoracic diseases and especially benefits the data-hungry deep learning model. For example, the MIMIC-CXR dataset [13] contains 377, 110 chest X-rays associated with 14 labels, the Chexpert dataset [14] provides 224, 316 chest X-rays associated with 14 labels, the PadChest dataset [15] includes more than 160, 000 images labeled with 19 differential diagnoses. Among the larger publicly available chest X-ray datasets, the Chest X-ray14 dataset [20] attracts more research due to its earlier publish and higher quality and has been established strong baselines [16, 17]. Due to the comparable strong baselines, we adopt this dataset to demonstrate the advantage of our proposed method. To automatically extract the lung and heart regions from the global images, we use the JSRT dataset [22] to train the lung and heart segmentation model. It provides 154 nodule and 93 non-nodule chest X-ray images. A detailed delineation of the segmentation’s nodule is publicly available to train the lung, and heart segmentation [23]. The annotation images for segmentation tasks are binary images in which pixels are 255 for the foreground and 0 for the background.

Attention mechanisms for medical image analysis. Recently, attention mechanisms applied in CNN can significantly enhance the performance of various tasks in the field of medical image analysis [24,25,26]. For instance, A novel Attention Gate (AG) [27] can be easily integrated into standard CNN models to leverage salient regions in medical images for various medical image analysis tasks, including fetal ultrasound classification and 3D computed tomography (CT) abdominal segmentation. Attention mechanisms can help detect subtle differences between different diseases by guiding the model activations to focus on salient regions. This feature is particularly suitable for analyzing chest X-ray images due to the low resolution and poor specificity of chest X-ray images [28, 29]. For example, a contrast-induced attention network [30] is proposed to exploits the highly structured property of chest X-ray images and localizes diseases via contrastive learning on the aligned positive and negative samples. For the multi-label classification problem of thoracic diseases, an attention-guided mask inference process is designed to locate salient regions and learn the discriminative feature for classification [16]. Inspired by this work, we improve the spatial-attention module in CBAM [31] to design a multi-scale attention module, which helps explore discriminative cues to advance the classification performance by detecting subtle differences.

Local Learning for chest X-ray classification. Due to the relative scarcity of region-level annotations, local localization and learning are gaining increasing attention in the field of chest X-ray image analysis [32, 33]. A thoracic disease is highly characterized by a pathological region, which contains critical cues for classification. With only image-level class labels, previous works [2, 10, 11] for thoracic disease classification typically learn the discriminative information from the global image by supervised training. However, it is prone to be affected by normal regions. To address the problems caused by merely relying on the global image, recent approaches have shifted to learn the discriminative information from local regions containing pathological information. For example, a deep learning framework (SENet) [12] equipped with the squeeze-and-excitation block [34] reinforces the sensitivity to subtle differences between normal and pathological regions by explicitly modeling the channel interdependence. More methods for local location rely on saliency maps or saliency maps [17,18,19]. For instance, in SalNet [17], the Gumbel-softmax function [35] is used to combine the region proposal and saliency map detector to sample discrete regions from a set of proposed regions differentially. However, without region-level annotations, they cannot precisely identify pathological regions by selecting local regions.

To avoid discriminative information loss in location deviation of pathological regions, some methods fuse the global image training and the local region learning. The deep fusion network unifying global and local features is gradually popular in computer vision tasks [36, 37]. For thoracic disease classification in chest X-ray images, the representative work of fusion methods are the segmentation-based deep fusion network (SDFN) [21] and the three-branch attention-guided network (AGCNN) [16]. In SDFN, a global classifier is used as feature extractors to obtain the discriminative features from the entire chest X-ray image, and the cropped lung regions generated by the segmentation model are learned by a local classifier. The obtained features from the global and local classifiers are fused by the feature fusion module for disease classification. Our method and SDFN all use the JSRT dataset [23] to train a pixel-wise segmentation model. However, the fusion methods must be careful tuned to prevent the local features containing pathological information from drowning in the global features. Hence, we apply feature weighting but not fusion to enhance visual cues unique to the lung and heart regions based on the learned global attention maps and the segmented masks.

Based on the above discussion of related works, our proposed method has two novel folds: (1) a feature extractor equipped with the multi-scale attention module is used to learn the global discriminative information; (2) feature weighting strategy is applied to enhance features of the lung and heart region containing pathological information. Extensive experiments on the chest X-ray14 dataset demonstrate the effectiveness of our method.

Methods

Based on image-level class labels, our method is proposed to address the multi-label classification of thoracic diseases by learning the discriminative information from chest X-ray images effectively. This section will elaborate on our method, including the problem statement, feature extractor, feature weighting.

Problem statement

Thoracic disease classification is a multi-label classification problem that detects if one or multiple diseases are presented in each chest X-ray image. We define a 14-dimensional label vector \({\varvec{Y}}=\{y_{1}, \dots , y_{i}, \dots , y_{c}\}\) for each image, where \(c=14\) and \(y_{i} \in \{0, 1\}\). \(y_{i}\) indicates the presence with respect to corresponding diseases in the image (i.e. 1 for presence and 0 for absence) and an all-zero vector of 14-dimensions represents the status of “No Finding” (no disease is found in the scope of any of 14 disease categories as listed). The diseases in \({\varvec{Y}}\) are in the order of Atelectasis, Cardiomegaly, Effusion, Infiltration, Mass, Nodule, Pneumonia, Pneumothorax, Consolidation, Edema, Emphysema, Fibrosis, Pleural Thickening, and Hernia. We address this classification problem by training our classification model presented in Fig. 2 with the binary cross-entropy (BCE) loss function defined in Eq 1.

$$\begin{aligned} \begin{aligned} BCE({\varvec{Y}},{\varvec{\hat{Y}}}) = - \frac{1}{c}\sum _{i=1}^{c}[y_{i}\log (\hat{y_{i}}) + (1-y_{i})\log (1-\hat{y_{i}})] \end{aligned} , \end{aligned}$$
(1)

where c is the number of diseases (classes), \({\varvec{Y}}\) is the ground truth, and \({\varvec{\hat{Y}}}\) denotes the predicted probability.

Fig. 2
figure 2

The framework of our proposed method. A feature extractor equipped with a multi-scale attention module aims to learn the discriminative information from a chest X-ray image to generate a global attention map. A well-trained pixel segmentation model locates the lung and heart regions to binarize a mask in which pixels are 1 for lung and heart regions and 0 for other regions. A local attention map focusing on the lung and heart regions is formed by introducing a logical AND operator on the mask and the global attention map. This local attention map contains features of the pathological region and suppresses the normal region

Our proposed deep framework covers three parts: a feature extractor, a pixel-wise segmentation model, and a feature weighting module. The feature extractor is to embed the global discriminative information into a global attention map by applying a multi-scale attention module. The multi-scale attention module helps the feature extractor to focus on salient regions and detect subtle texture abnormality. Simultaneously, the well-trained pixel segmentation model identifies areas of the lung and heart, following binarized as a global mask in which pixels are 1 for lung and heart region and 0 for other regions. Then we conduct an element-wise summation operation on the global attention map and the global mask to generate a local attention map. By weighing the lung and heart region features, the local attention map only contains visual cues unique to the lung and heart region containing pathological information and discards features of non-lung and heart regions by zeroing operation. Following the local attention map, an average pooling layer and a fully-connected layer are introduced to train disease-specific probability by binary cross-entropy loss.

Feature extractor

The feature extractor consists of a multi-scale attention module and a backbone. Each chest X-ray image \({\varvec{X}}\) is resized into \(3\times 224\times 224\) and firstly inputted into the multi-scale attention module. The multi-scale attention module computes a spatial feature hierarchy consisting of two convolutional layers with a kernel step of 2 and three blocks of calculating maximum and average across channels. The spatial feature hierarchy is convoluted into a feature map of \(1\times 224\times 224\) dimension and merged into the global image by element-wise multiplication. Based on this operation, the global image element is spatially weighted by computing the maximum value at different scales. The multi-scale spatial attention module can detect subtle differences at different scales. Hence, it can enhance the multi-label classification performance by exploiting the visual cues effectively. After a sigmoid activation function, the feature map is merged into the original chest X-ray image by element-wise multiplication, following fed into the backbone. We use the pre-trained 121-layer DenseNet [38] as the backbone. We take out the last convolutional feature map from backbone as a global attention map \({\varvec{F_{g}}}\) with \(c\times h\times w\) dimensions. The global attention map learns the discriminative information from the chest X-ray image. The disease-specific feature may be drowned in global features and can not play a differentiating role in classification.

Feature weighting

We apply U-Net [39] to train a segmentation model for the left lung, right lung, and heart on the JSRT dataset by using dice loss. The dice loss is formulated as:

$$\begin{aligned} \begin{aligned} \mathrm {dice} = \frac{2|M_{gt} \bigcap M_{prob}|}{(|M_{gt}| + |M_{prob}|)} \end{aligned} , \end{aligned}$$
(2)

where \(M_{gt}\) denotes the ground truth mask, and \(M_{prob}\) is the predicted mask. The dice loss is minimized for optimization and the model with the smallest loss was saved. The image pre-processing of U-Net follows the same pipeline of the feature extractor to enable automatic region segmentation for the chest X-ray14 dataset. We first input the chest X-ray image \({\varvec{X}}\) into the well-trained segmentation model to generate three pixel-wise masks for the left lung, right lung, and heart. Then we merge the three pixel-wise masks into a pixel-wise mask \({\varvec{M_{g}}}\) in which pixels are either 1 for the lung and heart regions or 0 for other regions by pixel-wise summation. The pixel-wise mask \({\varvec{M_{g}}}\) further is resized into a size of \(1\times h\times w\) equal to the width and height of the global attention map \({\varvec{F_{g}}}\) by adaptive average pooling. The global attention map \({\varvec{F_{g}}}\) of \(c\times h\times w\) is taken out from the backbone of the image classifier. Further, we generate a local attention map \({\varvec{F_{l}}}\) of \(c\times h\times w\) from the global attention map and the pixel-wise mask by element-wise multiplication. We introduce the logical AND operator on the global attention map and the pixel-wise mask. The local attention map contains the zero pixels of non-lung and heart regions and the non-zero pixels of the lung and heart regions. Hence, only the pixel values of the lung and heart region containing pathological information in the local attention map are embedded into the average pooling layer for label prediction by a channel-wise average operation, and the pixel values of other regions in the attention map are zeroed. The feature weighting for the global attention map \({\varvec{F_{g}}}\) and the pixel-wise mask \({\varvec{M_{g}}}\) is defined as:

$$\begin{aligned} \begin{aligned} {\varvec{F_{l}}} = {\varvec{F_{g}}} \otimes {\varvec{M_{g}}} \end{aligned} . \end{aligned}$$
(3)

With the help of the multi-scale attention module, the global attention map effectively learns the salient information from the chest X-ray image, containing the discriminative information in the lung and heart. The pathological regions are typically located in the lung and heart, hence, we introduce the binary masks on the global attention map to generate the local attention map. The generated local attention map suppresses the information of other regions and remains the information of the lung and heart regions. By logical AND operation, we locate features of the lung and heart regions containing pathological information.

Experimental setups

In order to test the performance of our proposed framework, we conduct extensive experiments on the public chest X-ray14 dataset to verify the effectiveness of our method. In this section, we will describe the experimental details.

Chest X-ray14 dataset consists of 112, 120 frontal-view X-ray images of 30, 805 unique patients [20]. Each image is labeled with one or multiple classes of 14 common thoracic disease: Atelectasis, Cardiomegaly, Effusion, Infiltration, Mass, Nodule, Pneumonia, Pneumothorax, Consolidation, Edema, Emphysema, Fibrosis, Pleural Thickening, and Hernia. Besides, the dataset also contains 984 labeled bounding boxes for 880 images related to 8 different diseases by board-certified radiologists. In our experiments, we use disease labels as ground-truth for model training. At the same time, we utilize the bounding boxes for qualitative observation of pathological region localization on chest X-rays. As Table 1 shows, the benchmark split of this dataset [20] contains train set of 86, 524 images for model training, test set of 25, 596 images for model evaluation, and box set of 984 images for model visualization. We randomly select \(10\%\) of each disease in the train set as the validation set for model validation. There is no patient overlap between the three splits. There are some images with multi-label, so the number of multi-label totals is greater than the finding number.

Table 1 The statistics of the benchmark split on the Chest X-ray14 dataset

Comparative methods. Researches on addressing the multi-label classification problem of thoracic diseases have established strong baselines on the benchmark split of the chest X-ray14 dataset.

  • DCNN [20]. In this work [20], they first released the benchmark split of the chest X-ray14 dataset and presented a deep convolutional neural network (DCNN) to tackle thoracic disease classification. We reproduce this method by using the pre-trained ResNet-50 [40], which achieved the best performance in this work.

  • CheXNet [11]. CheXNet [11] is a 121-layer DenseNet [38] trained on the chest X-ray14 datset. This work demonstrated that the performance of CheXNet is statistically significantly higher than radiologist performance.

  • SENet [12]. To deal with the challenge that thoracic diseases usually happen in localized disease-specific areas, Yan et al. [12] presented a weakly-supervised deep learning framework equipped with squeeze-and-excitation blocks (SENet) to classify thoracic disease. This work was based on the CheXNet model using DenseNet as the backbone and first explored the problem of learning disease-specific areas.

  • SDFN [21]. Liu et al. [21] provided a segmentation-based deep fusion network (SDFN) to leverage the discriminative information of local regions. SDFN adopted pixel-level segmentation to detect local regions and applied a deep fusion framework to unify the global and local features. Our method also identifies the lung and heart region by using pixel segmentation. But we argue that the deep fusion method can not effectively tackle the problem that the local features are drowned in the global features. Hence, we use a feature weighting strategy to focus on the local features.

  • AGCNN [16]. Guan et al. proposed a three-branch attention-guided convolutional neural network (AGCNN) [16] for the task of thoracic disease classification on chest X-ray images. This work located salient regions from the global attention map then cropped the corresponding regions from the chest X-ray image.

  • SalNet [17]. Hermoze et al. [17] designed a three-stage deep learning framework (SalNet) for weakly-supervised disease classification by combining region proposal and saliency detection. This work obtained the local regions from salient maps based on region proposals and achieved the best performance on the benchmark split of the chest X-ray14 dataset.

Implementation details and evaluation protocal. We implement CXR-IRNet with the Pytorch framework and use the pre-trained 121-layer DenseNet as the backbone of the feature extractor. We extract the last convolutional feature map of DenseNet as the global attention map. The single output is used for class-probability prediction after a sigmoid non-linearity. For the multi-scale attention module, apart from the original image as one feature, we adopt two convolutions of kernel size 5, 9 to generate the other two features, these three-scale features for following operations. We resize each chest X-ray image to \(256 \times 256\), and then perform center cropping to obtain an image of size \(224 \times 224\) for training. Each cropped image is normalized with the same mean and standard deviation. We use Adam optimizer with a learning rate of 0.001 and weight decay of 0.0001. Our network is trained for 50 epochs from scratch with a batch size of 512. For comparative methods, we directly report the published performance of SDFN and SalNet, no reproduction. The other methods are implemented by the same experimental setup for a fair comparison. For evaluation, we report the area under the receiver operating characteristic curve (AUROC) and ROC curve. Both are widely used for performance assessment of multi-label classification. The ROC curve comprises of two evaluation criteria to measure performance, including sensitivity (true positive rate) and specificity (true negative rate). For detection visualization, we evaluate in terms of the intersection over union (IoU) on the box set.

Results and discussions

The following research questions will be answered by analyzing experimental results:

\(\textit{RQ1}\):

Can feature weighting of the lung and heart regions help improve the performance?

\(\textit{RQ2}\):

How is the effectiveness of the multi-scale attention module on learning pathological information?

Classification performance (RQ1)

Table 2 Comparison of AUROC performance on the benchmark split of Chest X-ray14 dataset

In Table 2, we report the classification performances of the proposed method and comparative methods in terms of AUROC scores, evaluated by the test set of the benchmark split. Our method achieves the best performance (boldface font) over 4 diseases, including Infiltration, Nodule, Fibrosis, and Pleural Thickening. In terms of the average AUROC, our method is superior to comparative methods. The overall results show that our method establishes a new state-of-the-art on the benchmark split of the chest X-ray14 dataset. Methods (SENet, SDFN, AGCNN, SalNet) unifying the global and local features obtain better performance than methods (DCNN, CheXNet) only employing the global image. To overcome location deviation of methods (AGCNN, SalNet) relying on saliency maps and region proposal, our method identifies the lung and heart regions containing pathological information by using pixel-wise segmentation same as SDFN. However, we argue that unifying the global and local features can not prevent local discriminative information from smoothing out in the global features. Hence, we consider the feature weighting strategy but not fusion like SDFN and AGCNN. SENet locates suspicious lesion regions by using a multi-map transfer layer to encode activations associated with each disease class. Such feature weighting strategy makes it more capable of discriminating the appearance of multiple thoracic diseases on the same chest X-ray, then helps it yields good performance. Different from SENet, Our method conduct feature weighting on the global attention map by using segmentation masks. Benefit from the segmentation locating the lung and heart regions containing pathological information precisely and the feature weighting strategy zeroing features of non-lung and heart regions in attention maps, our method establish a new baseline on the chest X-ray14 dataset.

As Table 2 show, without feature weighting (Ours w/o \({\varvec{F_{l}}}\)), our deep framework equipped with the multi-scale attention module can achieve the competitive performance, including the highest performance of 4 diseases and the second-highest average performance. Without feature weighting, our deep framework is equal to ChXNet employing the global image. The improved performance demonstrates that the multi-scale attention module can effectively learn the discriminative information from the global image. Only the discriminative information is learned into the global attention maps, the feature weighting on the global attention maps can locate the lung and heart regions containing pathological information. The multi-scale attention module exploits the salient information from chest X-ray at three scales, then detects visual cues unique to pathological regions. The performances of Ours w/o \({\varvec{F_{l}}}\) confirm the contribution of the multi-scale attention module. With the help of the multi-scale attention module, our method can further improve the performance by applying the feature weighting strategy to enhance the visual cues unique to the lung and heart regions. We argue that local features containing pathological information maybe drown in the global feature by applying the fusion framework like AGCNN and SDFN. Hence, based on locating the lung and heart regions containing pathological information by segmentation, we directly zero features of non-lung and heart regions.

To further demonstrate the advantage of the feature weighting strategy, we can observe the performance of some diseases. The Infiltration AUROC of our method is significantly improved compared to other methods. The improvement ratio reaches \(12.87\%\) compared to the second-highest performance yielded by SDFN. As Table 1 show, the number of Infiltration image is the most among the diseases. However, other methods can not achieve better performance due to the poor specificity of Infiltration. At the same time, we can observe that the pathological region of Infiltration occupies a relatively large area in the left lung, as shown in Fig. 1. This demonstrates that the effectiveness of the feature weighting strategy. The pathological region of Infiltration covers a large area of the left lung, and features of the left lung are enhanced after feature weighting. Hence, the class-probability prediction mainly relies on the learned pathological information of Infiltration. In other words, the pathological information of Infiltration is not weakened or even lost in the pipeline of our deep framework, while the non-pathological regions are suppressed. Benefiting from weighing features of the lung and heart regions, the performance of Nodule is up to 0.8377 obtained by our method. The pathological region of Nodule is usually small and easily drowned in the global image. The characteristics and performance of Nodule also demonstrate the effectiveness of the feature weighting strategy zeroing features of non-pathological regions. Based on the above discussion, we can infer that the feature weighting strategy can help improve classification performance and is superior to the fusion method.

Fig. 3
figure 3

ROC Curves of our proposed method on the benchmark split of Chest X-ray14 dataset

Figure 3 shows the ROC curves of our method on the 14 diseases of the benchmark split. According to the ROC curve trained on the chest X-ray14 dataset, we set the class threshold for each disease to classify a new chest X-ray image. Due to the reliable performance, our model has been successfully applied in routine clinical screening to assist radiologists.Footnote 1 We automatically output screening results of our method before the radiologists read the chest X-ray images in the picture archiving and communication systems (PACS). On the user interface of PACS, the radiologists can get the pre-screened result to make further diagnosis. For automatic screening chest X-rays, the underlying idea is to effectively suppress non-pathological regions and learn visual cues of pathological regions. In this work, we devote ourselves to locating the lung and heart regions containing pathological information by designing the multi-scale attention module and feature weighting strategy. Our proposed framework can avoid the deviation in locating pathological regions by using pixel-wise segmentation and the local features drown in the global features by using feature weighting. In the future, we try to improve our model by applying region-wise detection to learn visual cues unique to pathological regions.

Learning capability (RQ2)

Fig. 4
figure 4

Pathological region visualization of the box set in the Chest X-ray14 dataset. The red bounding box is the pathological region by hand-labeled, and the blue bounding box is predicted by our model

Table 3 Comparison of IoU performance on the box set of the chest X-ray14 dataset

The capability of learning pathological information determines the final classification performance. Even if the lung and heart regions can be located accurately by pixel-wise segmentation, but if the feature extractor can not learn the pathological information in the lung and heart regions, the feature weighting strategy can not help improve the performance. So we need to analyze the effectiveness of the multi-scale attention module in learning pathological information. The best average AUROC in Table can demonstrate that our proposed method has reliable learning capability. Apart from this proof, we further adopt the box set with ground truth (bounding box) to evaluate the learning capability of pathological information of the feature extractor equipped with the multi-scale attention module. We apply class activation map (CAM) [41] to locate regions containing pathological information. Then we use IoU to evaluate the performance of the predicted pathological regions based on the ground truth pathological regions.

Some images with the higher IoU performance are shown in Fig. 4. This qualitative visualization demonstrates that the feature extractor can detect pathological regions with some probability. The detection performance can reflect the learning capability of the feature extractor equipped with the multi-scale attention module. The detection performance for Nodule (green rectangle) is lower than other diseases due to its small area, but the detected pathological region lay on the left lung. By filtering out the non-lung and heart regions, pathological information in the left lung can be used for label prediction. The detected pathological region of Cardiomegaly is almost overlapping with the heart region. The detected pathological region of Pneumonia also almost covers the left lung region. But the pathological region of Mass is severely deviating to the lung and heart region. Although the feature extractor has learned the pathological information, the pathological region will be filtered out in the process of feature weighting. Such cases affect the classification performance of our method and can be overcome by using region-level annotations. Our proposed method aims to improve the performance with image-level class labels. Further, we present an ablation study to demonstrate the contribution of the multi-scale attention module (MA) in the feature extractor (FE). In Table 3, the average IoU of FE can greatly outperform FE without MA by \(23.67\%\) from 0.2437 to 0.3014. This IoU performance is competitive to SalNet that reports an average of IoU of 0.29. Our feature extractor adopts the same backbone as DCNN, and the AUROC performance of our method without feature weighting (Ours w/o \({\varvec{F_{l}}}\)) is superior to DCNN in Table 2. Based on the above observations, we can conclude that the multi-scale attention module contributes to pathological information learning and classification performance improvement.

We typically divide a chest X-ray image into two parts: pathological region and non-pathological region. Our method aims to filter out the information of the non-pathological region. However, it is difficult to locate the pathological region without region-level annotations. Current works relying on saliency map or region proposal lead to location deviation. To overcome this issue, we apply pixel-wise segmentation to locate the lung and heart regions containing pathological information. Although the lung and heart regions can cover the pathological region in most cases, the non-pathological region in the lung and heart regions can not be filtered out by feature weighting. The feature weighting strategy only can filter out non-lung and heart regions. Despite this, our method applying the feature weighting strategy achieves better performance than methods using fusion strategy. With image-level class labels, we design two tricks to improve the performance of multi-label classification for screening chest X-rays. Based on the above experimental results and discussion, we have demonstrated the effectiveness of these two tricks.

Conclusions

In this work, we propose a novel deep framework for the multi-label classification of thoracic diseases in chest X-ray images. The proposed network aims to effectively exploit pathological regions containing the main cues for chest X-ray screening. We present a feature extractor equipped with a multi-scale attention module to effectively learn pathological information from chest X-ray images. At the same time, we apply the pixel-level segmentation to identify the lung and heart regions containing pathological information to overcome location deviation. Then, we adopt the feature weighting strategy to filter out the non-lung and heart regions. Based on our deep framework, the class-probability layer mainly rely on the information of the lung and heart regions. Evaluated on the benchmark split of the chest X-ray14 dataset, we establish a new state-of-the-art baseline. Our proposed network has been used in clinic screening to assist the radiologists. Chest X-ray accounts for a significant proportion of radiological examinations. It is valuable to explore more methods for improving performance.

Availability of data and materials

Not applicable.

Notes

  1. http://www.yibicom.com/.

References

  1. Brady A, Laoide RÓ, McCarthy P, McDermott R. Discrepancy and error in radiology: concepts, causes and consequences. Ulster Med J. 2012;81(1):3.

    PubMed  PubMed Central  Google Scholar 

  2. Kumar P, Grewal M, Srivastava MM. Boosted cascaded convnets for multilabel classification of thoracic diseases in chest radiographs. In: International conference image analysis and recognition. Springer; 2018, p. 546–552.

  3. Guan Q, Huang Y. Multi-label chest x-ray image classification via category-wise residual attention learning. Pattern Recognit Lett. 2020;130:259–66.

    Article  Google Scholar 

  4. Mao Y, Xue F-F, Wang R, Zhang J, Zheng W-S, Liu H. Abnormality detection in chest x-ray images using uncertainty prediction autoencoders. In: International conference on medical image computing and computer-assisted intervention. Springer; 2020, p. 529–538.

  5. Bozorgtabar B, Mahapatra D, Vray G, Thiran J-P. Salad: Self-supervised aggregation learning for anomaly detection on x-rays. In: International conference on medical image computing and computer-assisted intervention. Springer; 2020, p. 468–478.

  6. Xue C, Deng Q, Li X, Dou Q, Heng P-A. Cascaded robust learning at imperfect labels for chest x-ray segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer; 2020, p. 579–588.

  7. Abdulah H, Huber B, Lal S, Abdallah H, Soltanian-Zadeh H, Gatti DL. Lung segmentation in chest x-rays with res-cr-net (2020). arXiv preprint arXiv:2011.08655.

  8. Khan AI, Shah JL, Bhat MM. Coronet: A deep neural network for detection and diagnosis of covid-19 from chest x-ray images. Comput Methods Prog Biomed. 2020;105581.

  9. Tam LK, Wang X, Turkbey E, Lu K, Wen Y, Xu D. Weakly supervised one-stage vision and language disease detection using large scale pneumonia and pneumothorax studies. In: International conference on medical image computing and computer-assisted intervention. Springer; 2020, p. 45–55.

  10. Yao L, Poblenz E, Dagunts D, Covington B, Bernard D, Lyman K. Learning to diagnose from scratch by exploiting dependencies among labels (2017). arXiv preprint arXiv:1710.10501.

  11. Rajpurkar P, Irvin J, Zhu K, Yang B, Mehta H, Duan T, Ding D, Bagul A, Langlotz C, Shpanskaya K, et al. Chexnet: radiologist-level pneumonia detection on chest x-rays with deep learning (2017). arXiv preprint arXiv:1711.05225.

  12. Yan C, Yao J, Li R, Xu Z, Huang J. Weakly supervised deep learning for thoracic disease classification and localization on chest x-rays. In: Proceedings of the 2018 ACM international conference on bioinformatics, computational biology, and health informatics; 2018, p. 103–110.

  13. Johnson AE, Pollard TJ, Greenbaum NR, Lungren MP, Deng C-y, Peng Y, Lu Z, Mark RG, Berkowitz SJ, Horng S. Mimic-cxr-jpg, a large publicly available database of labeled chest radiographs (2019). arXiv preprint arXiv:1901.07042.

  14. Irvin J, Rajpurkar P, Ko M, Yu Y, Ciurea-Ilcus S, Chute C, Marklund H, Haghgoo B, Ball R, Shpanskaya K. Chexpert: a large chest radiograph dataset with uncertainty labels and expert comparison. Proc AAAI Conf Artif Intell. 2019;33:590–7.

    Google Scholar 

  15. Bustos A, Pertusa A, Salinas J-M, de la Iglesia-Vayá M. Padchest: a large chest x-ray image dataset with multi-label annotated reports. Med Image Anal. 2020;66:101797.

    Article  Google Scholar 

  16. Guan Q, Huang Y, Zhong Z, Zheng Z, Zheng L, Yang Y. Thorax disease classification with attention guided convolutional neural network. Pattern Recognit Lett. 2020;131:38–45.

    Article  Google Scholar 

  17. Hermoza R, Maicas G, Nascimento JC, Carneiro G. Region proposals for saliency map refinement for weakly-supervised disease localisation and classification (2020). arXiv preprint arXiv:2005.10550.

  18. Yao L, Prosky J, Poblenz E, Covington B, Lyman K. Weakly supervised medical diagnosis and localization from multiple resolutions (2018). arXiv preprint arXiv:1803.07703.

  19. Tang Y, Wang X, Harrison AP, Lu L, Xiao J, Summers RM. Attention-guided curriculum learning for weakly supervised classification and localization of thoracic diseases on chest radiographs. In: International workshop on machine learning in medical imaging. Springer; 2018, p. 249–258.

  20. Wang X, Peng Y, Lu L, Lu Z, Bagheri M, Summers RM. Chestx-ray8: hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2017, p. 2097–2106.

  21. Liu H, Wang L, Nan Y, Jin F, Wang Q, Pu J. Sdfn: segmentation-based deep fusion network for thoracic disease classification in chest x-ray images. Comput Med Imaging Graph. 2019;75:66–73.

    Article  Google Scholar 

  22. Shiraishi J, Katsuragawa S, Ikezoe J, Matsumoto T, Kobayashi T, Komatsu K-I, Matsui M, Fujita H, Kodera Y, Doi K. Development of a digital image database for chest radiographs with and without a lung nodule: receiver operating characteristic analysis of radiologists’ detection of pulmonary nodules. Am J Roentgenol. 2000;174(1):71–4.

    CAS  Article  Google Scholar 

  23. Van Ginneken B, Stegmann MB, Loog M. Segmentation of anatomical structures in chest radiographs using supervised methods: a comparative study on a public database. Med Image Anal. 2006;10(1):19–40.

    Article  Google Scholar 

  24. Oktay O, Schlemper J, Folgoc LL, Lee M, Heinrich M, Misawa K, Mori K, McDonagh S, Hammerla NY, Kainz B et al. Attention u-net: learning where to look for the pancreas (2018). arXiv preprint arXiv:1804.03999.

  25. Nie D, Gao Y, Wang L, Shen D. Asdnet: Attention based semi-supervised deep networks for medical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer; 2018, p. 370–378.

  26. Li L, Xu M, Wang X, Jiang L, Liu H. Attention based glaucoma detection: a large-scale database and cnn model. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2019, p. 10571–10580.

  27. Schlemper J, Oktay O, Schaap M, Heinrich M, Kainz B, Glocker B, Rueckert D. Attention gated networks: learning to leverage salient regions in medical images. Med Image Anal. 2019;53:197–207.

    Article  Google Scholar 

  28. Wang H, Jia H, Lu L, Xia Y. Thorax-net: an attention regularized deep neural network for classification of thoracic diseases on chest radiography. IEEE J Biomed Health Inform. 2019;24(2):475–85.

    Article  Google Scholar 

  29. Ma C, Wang H, Hoi SC. Multi-label thoracic disease image classification with cross-attention networks. In: International conference on medical image computing and computer-assisted intervention. Springer; 2019, p. 730–738.

  30. Liu J, Zhao G, Fei Y, Zhang M, Wang Y, Yu Y. Align, attend and locate: Chest x-ray diagnosis via contrast induced attention network with limited supervision. In: Proceedings of the IEEE international conference on computer vision; 2019, p. 10632–10641.

  31. Woo S, Park J, Lee J-Y, So Kweon I. Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV); 2018, p. 3–19.

  32. Viniavskyi O, Dobko M, Dobosevych O. Weakly-supervised segmentation for disease localization in chest x-ray images. In: International conference on artificial intelligence in medicine. Springer; 2020, p. 249–259.

  33. Wolleb J, Sandkühler R, Cattin PC Descargan: Disease-specific anomaly detection with weak supervision. In: International conference on medical image computing and computer-assisted intervention. Springer; 2020, p. 14–24.

  34. Hu J, Shen L, Sun G. Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2018, p. 7132–7141.

  35. Jang E, Gu S, Poole B. Categorical reparameterization with gumbel-softmax (2016). arXiv preprint arXiv:1611.01144.

  36. Ding M, Antani S, Jaeger S, Xue Z, Candemir S, Kohli M, Thoma G. Local-global classifier fusion for screening chest radiographs. In: Medical imaging 2017: imaging informatics for healthcare, research, and applications, 10138. International Society for Optics and Photonics; 2017, p. 101380.

  37. Cao B, Araujo A, Sim J. Unifying deep local and global features for image search. In: European conference on computer vision. Springer; 2020, p. 726–743.

  38. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ. Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2017, p. 4700–4708.

  39. Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer; 2015, p. 234–241.

  40. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2016, p. 770–778.

  41. Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A. Learning deep features for discriminative localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2016, p. 2921–2929.

Download references

Acknowledgements

The authors would like to thank many members of the Intelligent Medical Imaging (iMED) group for the inspiring knowledge sharing, technical discussions, clinical background infusion.

Funding

This work was supported in part by Guangdong Provincial Key Laboratory of Brain-inspired Intelligent Computation (Grant No. 2020B121201001).

Author information

Authors and Affiliations

Authors

Contributions

JSF conceived the topic of this research and did the thoracic disease classification and wrote the manuscript. YWX, YTZ, YGY, and JLL participated in its design and revised it critically for the important intellectual content. JL conceived of the survey and participated in designing it. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Jiang Liu.

Ethics declarations

Ethics approval

NA

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliation.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Fang, J., Xu, Y., Zhao, Y. et al. Weighing features of lung and heart regions for thoracic disease classification. BMC Med Imaging 21, 99 (2021). https://doi.org/10.1186/s12880-021-00627-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12880-021-00627-y

Keywords

  • Chest X-rays
  • Thoracic diseases classification
  • Pixel-wise segmentation
  • Lung and heart regions
  • Multi-scale attention