Skip to main content

A methodical exploration of imaging modalities from dataset to detection through machine learning paradigms in prominent lung disease diagnosis: a review

Abstract

Background

Lung diseases, both infectious and non-infectious, are the most prevalent cause of mortality overall in the world. Medical research has identified pneumonia, lung cancer, and Corona Virus Disease 2019 (COVID-19) as prominent lung diseases prioritized over others. Imaging modalities, including X-rays, computer tomography (CT) scans, magnetic resonance imaging (MRIs), positron emission tomography (PET) scans, and others, are primarily employed in medical assessments because they provide computed data that can be utilized as input datasets for computer-assisted diagnostic systems. Imaging datasets are used to develop and evaluate machine learning (ML) methods to analyze and predict prominent lung diseases.

Objective

This review analyzes ML paradigms, imaging modalities' utilization, and recent developments for prominent lung diseases. Furthermore, the research also explores various datasets available publically that are being used for prominent lung diseases.

Methods

The well-known databases of academic studies that have been subjected to peer review, namely ScienceDirect, arXiv, IEEE Xplore, MDPI, and many more, were used for the search of relevant articles. Applied keywords and combinations used to search procedures with primary considerations for review, such as pneumonia, lung cancer, COVID-19, various imaging modalities, ML, convolutional neural networks (CNNs), transfer learning, and ensemble learning.

Results

This research finding indicates that X-ray datasets are preferred for detecting pneumonia, while CT scan datasets are predominantly favored for detecting lung cancer. Furthermore, in COVID-19 detection, X-ray datasets are prioritized over CT scan datasets. The analysis reveals that X-rays and CT scans have surpassed all other imaging techniques. It has been observed that using CNNs yields a high degree of accuracy and practicability in identifying prominent lung diseases. Transfer learning and ensemble learning are complementary techniques to CNNs to facilitate analysis. Furthermore, accuracy is the most favored metric for assessment.

Peer Review reports

Introduction

Lung diseases are conditions classified as medically aberrant and impair the functionality of the lungs. Typically, the medically abnormal status of the lung is accompanied by a few specific signs and symptoms. Some intrinsic malfunction of the lungs stimulates the progression of the diseases. The World Health Organization (WHO) reported the top ten fatal diseases from 2000 to 2019. Unexpectedly, the majority of these are lung-related, including COPD ranking third, lower respiratory infections ranking fourth, and trachea, bronchus, and lung cancer ranking sixth in mortality causes [1]. Among the ailments that affect the lower respiratory tract, the most common ones are pneumonia, bronchitis, and influenza [2]. Chronic respiratory diseases (CRDs) are incurable conditions that disrupt the delicate balance of the lungs. They mainly appear as COPD and asthma-causing impairments.

Surprisingly, most deaths related to COPD occur in people under 70 years old. The impact is striking, with COPD claiming about 3 million lives yearly, accounting for 6% of mortality. Asthma is also widespread, affecting children and adults, with around 262 million individuals affected [3]. We will never forget the pandemic kind of lung disease that we live with, known as the novel COVID-19, caused by the SARS-CoV-2 virus. As of 2023, the WHO estimates that the virus has infected over 663 million individuals and generated around 7 million fatalities [4]. A considerable number of people die worldwide as a result of lung diseases and their various prominent forms.

Traditional diagnostic procedures focus on manual symptom analysis to diagnose lung illnesses, with clinicians directing future prescription selections based on disease features evaluated [5]. However, the Association of Interdisciplinary Fields causes technology to be coupled with manual analysis for computer-aided diagnosis. As a result, the healthcare sector relies on technology such as medical imaging and ML. Medical imaging refers to the techniques and technologies used to produce visual representations of the interior of a body. In recent years, it has been widely applied to healthcare. It plays a significant role in modern medicine and is used in almost every aspect of patient care, such as diagnosis, therapy, and surgery. It helps clinicians identify and pinpoint disease progressions more precisely. Numerous imaging modalities have been utilized to detect and analyze lung diseases, including chest X-rays [3], CT scans [6], MRI [7], PET [6], sputum smear microscopy images (SSMI) [8], and molecular imaging [9]. X-rays and CT scans are the most commonly used anatomic imaging modalities for detecting and diagnosing various lung diseases [6].

ML has significantly impacted medical imaging, and there has been substantial progress in applying ML-based detection approaches and algorithms. ML can diagnose lung disorders using images from medical or radiological procedures [10]. ML, a subfield of artificial intelligence (AI), tries to make computers learn from data [11]. Consequently, ML offers an automated framework that may be utilized to detect or anticipate lung illnesses in their earliest stages compared to manual methods [12].

Identifying prominent lung conditions such as Pneumonia, Lung cancer, and COVID-19 using imaging and ML encounters some impediments:

  • The intricate characteristics of lung structures and the overlapping patterns of diseases might result in misinterpretations.

  • Various imaging methods may lead to differences in the quality and consistency of data.

  • The scarcity of labeled datasets impeded the training of accurate models, particularly regarding rare illnesses.

  • The progressive characteristics of disorders such as COVID-19 provide difficulty for pre-existing models.

  • Some solutions can be opted to deal with these impediments:

  • Model generalization may be improved by supplementing datasets with diversified samples and assuring uniform imaging techniques.

  • Continuous model adaption via real-time data updates is critical, particularly with changing features.

  • Using ML approaches may improve model interpretability and decision-making. ML systems in lung disease diagnosis benefit from regular validation based on real-world clinical results [10,11,12].

  • This review analyzes ML approaches for diagnosing lung diseases. The main contribution of the research is:

  • It investigates and addresses prominent lung diseases such as pneumonia, lung cancer, and COVID-19.

  • It investigates and addresses the publicly accessible imaging modalities datasets for each prominent lung disease.

  • It explores and addresses existing challenges and issues in diagnosing prominent lung diseases using ML and its associated novel solutions.

  • It examines ML and its subfield approaches for identifying prominent lung diseases based on radiographic images and their significance.

  • It qualitatively assesses ML approaches, emphasizing their efficiency in identifying, classifying, and forecasting prominent lung diseases while outlining essential considerations for enhancing the diagnosis.

  • The particularity of the investigation is that it offers a conceptual context for the issues. Furthermore, the analysis emphasizes the techniques and primary methods used in the published findings.

The structure of the review is as follows: Section 2 explains the approach utilized to conduct this review and addresses the necessity of a study in light of recent research. Lung diseases and their classifications, following the most prevalent and well-researched trends, are described, as are the challenges in diagnosing lung diseases, in Section 3. In Section 4, the imaging modalities, both conventional and other types, are described. Section 5 discusses machine learning, its trends, prominent sub-fields, and the initial steps for applying machine learning to diagnosing pulmonary diseases. Section 6 presents the diagnosis of prominent lung diseases using ML and imaging and also comprises publicly accessible datasets for each one, along with extensive analysis and narratives. Section 7 provides observations and discussions. Section 8 concludes the review.

Necessity

Multiple reviews/surveys/studies were examined, contrasted, and presented in Table 1 because of the tremendous relevance of correctly identifying prominent lung diseases using imaging modalities and ML.

Table 1 Comparative analysis of the review with recent researches

As far as we know, previous research has yet to provide a combined comprehensive examination of identifying prominent lung diseases with ML and imaging modalities datasets. The methodology, procedures, and techniques of ML and imaging modalities are examined and brought to light in this research, which provides less time for understanding.

Methodology

The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flowchart is depicted in Fig. 1, illustrating the approach taken. Establishing a suitable pre-existing research repository was essential for accessing scholarly research articles.

Fig. 1
figure 1

PRISMA flowchart

Scopus and Web of Science were preferred due to their prominence as widely used research databases for academic, peer-reviewed scientific papers. In addition, the well-known databases of academic studies that have been subjected to peer review, namely ScienceDirect [23], arXiv [24], IEEE Xplore [25], and MDPI [26], were also used for the search of articles. Only relevant published articles that are related to the issues are taken into consideration.

Identification

Databases were searched using pertinent keywords to explore all feasible machine learning-assisted lung disease diagnosis publications. Applied keywords and combinations used to search procedures with primary considerations for review, such as lung diseases, imaging modalities, and ML, are presented in Table 2.

Table 2 Applied keywords for searching procedure

Studies were limited to articles written in English only. Only studies employing ML and its prominent subfields to diagnose lung diseases utilizing specific imaging modalities are included in this review. Studies that are deemed unimportant are excluded. 151 publications from the Scopus database and 92 articles, reports from Google Scholar, the website, and additional databases, including ScienceDirect, MDPI, and IEEE Xplore, were chosen at this round.

Screening

The screening process ensured the selection of only relevant research. The review included only substantial titles and abstracts, not requiring a full-text assessment.

We manually eliminated duplicate titles, resulting in 22 remaining publications. Based on the screening, we selected 221 publications, excluding 40 due to irrelevance. All screened research publications pertained to an entitlement review.

Inclusion

To conduct an entitlement review, we analyzed every research publication we examined. We evaluate each piece of research before considering it for assessment. At the end of this round, we found 181 viable studies/resources through manual investigation.

Lung diseases

Humans breathe by expanding and contracting their lungs to intake and expel oxygen, which is then circulated via deep lung arteries to generate energy for their bodies [27]. Lung diseases include a variety of ailments that influence lung function. These include obstructive, restrictive, and infectious diseases affecting lung structure and function. Lung diseases can be categorized as depicted in Fig. 2.

  • Airways-Related Lung Diseases: The lung's windpipe, or trachea, is split into bronchi, branching into smaller tubes that extend throughout the lungs. Some conditions that might affect these airways include asthma, COPD, acute bronchitis, chronic bronchitis, emphysema, and cystic fibrosis.

  • Air Sacs-Related Lung Diseases: The respiratory system comprises bronchioles and narrow passageways inside the lungs, terminating in clusters of alveoli, also called air sacs. These air sacs facilitate the formation of tissue in the lungs. Pneumonia, TB, emphysema, pulmonary edema, COVID-19, and lung cancer represent a selection of respiratory ailments affecting the lungs.

  • Interstitium-Related Lung Diseases: The narrow, tiny membrane between the lung's alveoli is known as the interstitium. The interstitium is filled with tiny blood capillaries that facilitate the exchange of gases between alveoli and blood. A few lung conditions that impact the interstitium are interstitial lung disease (ILD), pneumonia, and pulmonary edema.

  • Blood-Vessels-Related Lung Diseases: Low-oxygen blood is pumped into the right side of the heart through veins. It uses the pulmonary arteries to push blood into your lungs. These blood vessels can also acquire diseases. Pulmonary embolism and pulmonary hypertension are two lung disorders that impact blood vessels.

  • Pleura-Related Lung Diseases: The pleura is a thin membrane surrounding the lungs and chest walls. A slight fluid coating with each inhalation permits the pulmonary pleura to slide down the wall. Pleural effusion and pneumothorax are pleural lung disorders.

  • Chest Wall-Related Lung Diseases: The chest wall is essential to the respiratory process. The ribs are connected by muscles, enabling the lungs to expand. The diaphragm descends with each breath, which allows the lungs to enlarge due to the action. Neuromuscular problems, chubbiness, and hypo-ventilation disorder are all diseases that disrupt the chest wall [28]. After reviewing these categories of lung diseases, explaining each one in depth is difficult due to the numerous kinds. Our review focuses on humanity's most debilitating and catastrophic prominent lung diseases.

Fig. 2
figure 2

Types of lung diseases

Prominent lung diseases

According to the information mentioned before introducing the issue, the WHO recently produced research outlining the top 10 diseases responsible for the most fatalities worldwide. Lung illnesses, in all of their many facets, are accountable for the deaths of a disproportionately high number of individuals all over the globe. According to the WHO, lung infections like pneumonia are responsible for an estimated 16% of all deceases of kids below the age of 5 worldwide. It is also a top reason for hospitalization for kids below 5 in the United States [2]. According to the WHO, about 1.8 million fatalities a year may be attributed to lung cancer, putting it at the forefront of mortality due to cancer globally. It is responsible for more deaths than breast, prostate, and colorectal cancers combined. Most lung cancer cases are caused by tobacco use, with tobacco smoke being the primary risk factor for the disease [1]. COVID-19 is a well-known type of lung disease caused by the coronavirus. WHO is closely monitoring the ongoing outbreak of COVID-19. COVID-19 is a worldwide epidemic that has already infected almost every nation globally. The WHO reports showed that pneumonia, lung cancer, and COVID-19 are the three conditions that account for most fatalities. As long as COVID-19 persists, the world needs more investigations.

The most frequent lung conditions that may be identified using medical imaging are pneumonia, lung cancer, and COVID-19. This research's most prevalent lung diseases include pneumonia, lung cancer, and COVID-19. Each is described in depth below:

Pneumonia

Pneumonia is a leading cause of morbidity and mortality worldwide, surpassing other prevalent illnesses such as cancer, diabetes, HIV/AIDS, malaria, and several others. It is a severe lung condition with severe medical consequences and a high casualty rate in the short and long term. It is a common respiratory illness affecting the airways and alveoli. The development of pneumonia also depends on the patient's immune system's response to viruses. Patients who suffer from pneumonia exhibit pulmonary abnormalities [29]. There is a diverse array of microbes that are capable of causing pneumonia, such as bacteria, pulmonary pathogens, and fungi. Pneumonic microbial invaders are numerous and diversified. Pneumonia is caused by viruses such as coronavirus, rhinovirus, influenza, parainfluenza, metapneumovirus, and bacteria such as pneumococcus, mycoplasma, legionella, Enterobacteriaceae, Haemophilus, and mycobacteria [30].

Lung cancer

Lung cancer arises from the growth of cancerous cells within lung tissues, exhibiting uncontrolled proliferation that may spread to distant organs or lymph nodes. Lung tumors are divided into three groups from a histopathological perspective: small-cell lung cancer (SCLC), which also includes small-cell carcinoma; non-small-cell lung cancer (NSCLC); and other uncommon forms of tumors, which include sarcoma and lymphoma. Adenocarcinoma, squamous cell carcinoma, and large-cell lung cancer are the three subtypes of NSCLC [31]. Smoking is crucial in identifying lung cancer since it plays a critical function in the disease [32].

COVID-19

A specific contagious lung disease that spreads to people exponentially is COVID-19. COVID-19 symptoms include flu, cough, and shortness of breath. Less common symptoms include headache, decreased smell (hyposmia), decreased taste sensation (hypogeusia), throat infection, runny nose (rhinorrhea), muscle cramps, diarrhea, and vomiting. The main barriers comprise acute respiratory distress syndrome (ARDS), numerous organ failures, and death [29]. An RT-PCR (real-time reverse transcriptase polymerase chain reaction) test is the most modern and innovative way to detect COVID-19. COVID-19 might be classified.

Mild cases

An asymptomatic COVID-19 infection characterized by coughing, fever, and headache.

Moderate cases

Patients experience some shortness of breath as well as pulmonary issues such as hypoxia.

Complex cases

The patient is suffering from hypoxia as well as shock. This kind is to blame for the great majority of life-threatening incidents.

COVID-19 is putting the entire world in a horrific situation, bringing all life to a screeching halt worldwide and claiming millions of lives. As we have seen, when a pandemic occurs, there is a collapse in the healthcare system because we are unable to satisfy all the demands. The COVID-19 epidemic has significantly impacted medical microbiology labs. "Long COVID-19" or "post COVID-19 syndrome" refers to signs that may affect a person's health after recovering from the COVID-19 virus. These symptoms have been reported in many patients who have recovered from the COVID-19 virus [33].

Developmental analysis of prominent lung diseases over the internet

Google is the finest search engine for asking any question, and as almost every internet user utilizes it, it is frequently used to look for any query. So, it's helpful to know how people search for the most common lung disease on the internet. A well-liked and publicly available big data analytics tool called "Google Trends" has been extensively utilized to examine perceptions in several studies. Google Trends' tracking of internet search queries may offer some helpful insight. The searches for lung diseases from 2019 to 2023 were analyzed for this study (Fig. 3) [34].

Fig. 3
figure 3

Worldwide lung disease searches on Google Trends

Lung cancer and acute lower respiratory tract infections, which include pneumonia, asthma, COPD, and TB, are the five primary lung illnesses addressed at the International Respiratory Society Forum. Pneumonia is the top relative search term on Google Trends, according to Barbosa et al., who also noted that there has been an increase in COVID-19 pneumonia cases [35]. Since lung cancer is a fatal disease affecting individuals worldwide, it is commonly searched for online, mainly through research searches. Before 2020, there was a lower volume of COVID-19 searches, but during the pandemic, there has been an exponential increase in COVID-19 searches online. Search comparisons are necessary in the context of all lung diseases (Fig. 4).

Fig. 4
figure 4

Worldwide Pneumonia, Lung Cancer and COVID-19 searches on Google Trends

The Y-axis in Fig. 4 displays the precise measurement numbering of Google Trends' searched queries, which illustrates the term's level of popularity [34].

Challenges and issues

Many lung disorders are avoidable but may go untreated due to a lack of diagnosis. Lung illness and other diseases, such as cardiovascular disease, sometimes coexist, yet combined diseases are usually misdiagnosed due to the significant overlap in symptoms [36]. When determining the presence of lung illnesses, there are several challenges to surmount. Some of them are as follows:

  • Selection of Efficient Imaging Modality: Various imaging modalities, including X-ray, CT scan, SSMI, PET, and MRI, have been chosen based on clinical requirements [6,7,8,9]. Medical image analysis requires the selection of an efficient imaging modality for the detection [15, 19].

  • Scarcity of Useful Datasets: To handle and analyze medical images, an environment that supports access to medical data, data analysis, and processing is required [17]. Various imaging modalities datasets are available for public access [6,7,8,9,10,11,12,13,14, 23,24,25,26].

  • Effectiveness of Models Derived from ML: The efficacy of models is crucial for identifying lung illnesses. If ML models are used, real-time diagnosis is essential. Thus, research on model training efficiency is necessary [30,31,32,33].

  • To Address Multiple Pulmonary Disorders Simultaneously: It is expected that the trained ML model would be able to identify multiple lung diseases appropriately, such as COVID-19, pneumonia, and others [19,20,21,22].

  • Medical Experts' Opinions: Although ML algorithms may be effective in classifying lung illnesses, medical expert evaluations and validations are required to confirm that the identification is correct [28,29,30].

Imaging modalities

Diagnostic imaging is widely acknowledged to have a significant role in clinical evaluation. The processing of diagnostic imaging requires practitioners with extensive expertise. Healthcare practitioners may benefit from computer-assisted solutions due to diverse assessments of images, resulting in varying findings and a tedious process that may result in significant expenses and glitches. On the contrary, the manual diagnosis of lung disorders using radiographic scans often takes a substantial amount of time and is prone to error. The prompt and precise identification of lung disorders has a crucial role in enhancing the prognosis, thereby increasing the sufferer's likelihood of survival. The radiographic findings might be of assistance [37]. When a radiological image of a patient is produced, it is processed in many phases, including image annotation and segmentation. After storing the images in the databases, the radiologists annotated them after adding pertinent information to help the reader interpret them. Image segmentation is one of the most critical aspects of image processing. Images are divided around regions of interest (ROIs) to segment them [38].

With ethical concerns, the patient's clinical and radiological imaging must be processed while maintaining the subject's privacy. After receiving ethical consent, obtaining patient data, de-identifying it appropriately, and storing it securely is necessary. Pseudonymization is the technique of choice for de-identification since it replaces information that may be used to infer the identity of a subject with identifiers. When images are pseudonymized, you can't use this information to figure out who a patient is [39].

Labeled imaging data is commonly cited as a challenge for machine learning in the context of expanding medical imaging datasets. Therefore, various strategies that allow for learning with less or different sorts of monitoring are necessary [40]. The overview of each one is represented here for a better understanding.

Conventional imaging modalities

X-ray

The chest X-ray (a CXR) is the diagnostic imaging method used most often in treating lung ailments. The availability, mobility, and cost-effectiveness of chest X-rays contribute to the initial evaluation of individuals exhibiting lung problems [3]. Since its earliest times, medical X-ray imaging has been captured on photographic films, which must be developed before being examined. Digital X-rays are used to solve this issue. The most popular medical X-ray diagnosis is a digital chest X-ray to diagnose lung disorders [41]. The vast majority of the analyzed studies used chest X-rays in their investigations. For instance, X-ray datasets were used for the diagnosis of pneumonia [42,43,44,45,46,47,48,49,50,51,52,53,54,55], lung cancer [44, 46, 47, 52, 56], and COVID-19 [47, 48, 53,54,55, 57,58,59,60]. Figure 5 depicts many chest X-ray illustrations of diverse lung diseases collected from publicly accessible datasets.

Fig. 5
figure 5

Instances of chest X-ray for prominent lung diseases

CT scan

In patients with severe lung disorders, a chest CT is frequently recommended. CT imaging is more precise than CXR imaging and is employed when radiography reveals anything unclear [3]. By circling the X-ray tube around the chest, the CT merges several X-ray projections recorded from various angles to generate cross-sectional imaging of regions within the chest [6]. Chest CT scans were used in most of the studies reviewed for this study. For instance, the diagnosis of pneumonia [63], lung cancer [64,65,66,67,68,69,70,71,72,73], and COVID-19 [57, 59, 60, 74,75,76,77,78] relied on datasets that were acquired from CT scans. Figure 6 depicts many chest CT scan illustrations of diverse lung diseases collected from distinct publicly accessible datasets.

Fig. 6
figure 6

Instances of CT scans for prominent lung diseases

Positron emission tomography

Nuclear imaging technology, such as PET, enables monitoring metabolic activities. It is done by injecting the patient with radiolabeled tracers and then figuring out where they went.

The most commonly used PET tracer is known as 18F-fluorodeoxyglucose (FDG). The disappearance of recognizable anatomical features is a defining characteristic of the PET imaging technique [6]. Lung disorders and nodules may be effectively evaluated with PET. It has an outstanding capacity for detecting metastases [81].

Figure 7 displays a chest CT scan of a lung nodule compared to a PET image, which provides a more improved view. The image was obtained from the Openi website, which provides access to publicly available images.

Fig. 7
figure 7

A A CT scan reveals a nodule in the anterior portion of the right lung's upper pole. B On 18F-FDG PET/CT, the lung nodule exhibited enhanced focused uptake, indicating a malignancy [82]

Magnetic resonance imaging

Comparing MRI to other radiography modalities like CT, and Comparing MRI to other radiography modalities like CT and PET, it becomes evident that MRI has little clinical use for patients with lung illnesses. MRI generates images of the region that has been chosen and exhibits them in the form of narrow slices that comprise the entire volume of the area. It did work because nuclei absorb radio frequencies when powerful magnetic fields are present. MRI employs a magnetic field and radio waves to obtain numerous images of the lungs' region from various angles. Combining these images may generate crisp and accurate portrayals of areas [81]. Lung MRI is an excellent technique for doing sequential follow-ups [7]. MRI procedures like three-dimensional gradient sequences and acceleration techniques, among others, have increased MRI's minor lesion detection capabilities [83]. Also, research has shown that MRI might be a better way to screen for lung cancer than low-dose CT [84].

Figure 8 displays the chest radiograph of a lung nodule compared to an MRI image. The image was obtained from the Openi website, which provides access to publicly available images.

Fig. 8
figure 8

Chest X-rays and MRI (A) A lesion in the right hilus pulmonis with a clear edge is seen on a chest X-ray. B An MRI shows a nodule in the right hilum. C A chest X-ray shows no mass but a tangled network of blood vessels (D) A normal chest X-ray [85]

Sputum smear microscopy images

A viscous fluid called sputum is produced in the lungs and air passages, which is a crucial factor in the progression of certain lung disorders. Sputum smear microscopy has generally been considered the most effective approach for diagnosing lung diseases like TB. Specimens of sputum expectorated by patients with symptoms are placed chemically onto plain glass microscope slides [8]. Then, they are analyzed by laboratory procedures that identify acid-fast bacteria (AFB), like Mycobacterium TB cells [86]. The images obtained from a sputum smear test are often obtained via fluorescence microscopy or conventional microscopy. SSMI images were captured using a digital microscope and a digital camera. The captured images have a specific size and resolution depending on the magnification. The "pixel pitch," which refers to the physical size of each image pixel, is measured in micrometers [87]. Figure 9 displays SSMI images. The image was obtained from the open-access dataset [88], which provides access to publicly available images.

Fig. 9
figure 9

Instances of SSMI for tuberculosis

Molecular imaging

Molecular imaging methods not previously used are also being studied to learn more about lung diseases. It is a specific type of imaging technique that combines the two fields of molecular biology and medical imaging. Recent research has been conducted on several methods of molecular imaging that have the potential to differentiate between the cellular and molecular components of respiratory illnesses. Alternative imaging techniques like single photon emission computed tomography (SPECT) can offer pertinent data at the molecular level because of their remarkable sensitivity and resolution. When it comes to the exactness of a lung diagnosis, the stage of the disease, or monitoring after treatment, molecular imaging may be a great addition to traditional imaging methods [9].

At-bedside imaging modalities

Evolving methods can assess, monitor, or measure lung disorders at the bedside. Bedside methods, including lung ultrasonography (LUS) and electrical impedance tomography (EIT), are gaining prominence alongside conventional imaging modalities. Since they do not require ionizing radiation and are very uncomplicated, these approaches are being intensively explored as an addition to traditional procedures and, in the case of specific lung problems, as a substitute for them [89].

Following is an overview of the numerous imaging modalities. It has become clear that each characteristic sets it apart from the others. Every imaging modality collects its own specific set of images, enabling radiologists to identify a variety of lung illnesses more accurately.

Machine learning

ML is a crucial component that can add resiliency to the medical decision-assistance systems. To better understand ML-based lung disease diagnosis, we provide a new analysis viewpoint on the different machine-learning strategies. The strategies for ML include supervised, unsupervised, and semi-supervised learning. Each method has benefits and drawbacks, and the selection of ML methodology hinges on the nature of the need [90] and the virtues and limitations listed in Table 3.

Table 3 Virtues and limitations of the various ML strategies

In supervised learning, the ML model has the input–output pair along with the labeled data [91], whereas in unsupervised, the model only has the input data without any labeled data. Unsupervised learning examines standard results without feedback mechanisms. This strategy extracts features to cluster input data into groups to train the model. The technique finds an unusual pattern in the input data [93]. On the other hand, semi-supervised learning can work with both labeled and unlabeled data [11]. This strategy can operate on massive amounts of data due to the applicability of labeled and unlabeled data, even though labeled data are limited.

The general assumption is that performance measures acquired from labeled data will perform better than those obtained from unlabeled data. This assumption, however, is only sometimes accurate since the researchers demonstrated that unlabeled data may also provide remarkable performance measures [94].

Machine learning developmental analysis on the internet

Since the turn of the decade, people worldwide have searched the internet using the term "machine learning." The Y-axis in Fig. 10 displays the precise measurement numbering of Google Trends' searched queries from 2012 to 2023, which illustrates the term's level of popularity [95]. Such statistics motivate the research of machine learning in the context of the study of the detection of lung diseases. The popularity of ML is seeing meteoric growth.

Fig. 10
figure 10

Machine learning searched the internet internationally

Introductory steps for employing machine learning to diagnose lung diseases

ML has the potential to diagnose and prognosticate lung illnesses. To make a diagnosis using imaging modalities, ML executes a series of actions, including acquiring an image dataset, preprocessing the image data contained within the dataset, performing feature extraction and selection, training an ML model using specific ML algorithms, and evaluating performance metrics and classification [96]. The lung disease diagnostic process using ML is shown in Fig. 11.

Fig. 11
figure 11

Lung disease diagnostic pathway with ML

The above-described introductory steps for employing ML to diagnose lung diseases act as the training phase of the ML model, which develops an ML diagnostic model. However, this ML diagnostic model must be validated using new or test data that the model has never seen before. Machine learning advances the lung disease diagnostic pathway. The fundamental framework of an ML-based diagnostic model is shown in Fig. 12, in which the model is trained using a training dataset and evaluated using new test data.

Fig. 12
figure 12

ML diagnostic model from the viewpoint of the training and testing phases

Many imaging modalities make it possible to record data about a patient's lungs from various angles and viewpoints, which may then be annotated and stored for later use [97].

Collecting these images produces an image dataset that can be preprocessed and employed as an input for the ML to operate on [98]. The necessary features must be retrieved and selected manually or automatically from the preprocessed picture dataset to train the model using any particular machine learning algorithm [99]. It is possible to do prediction or classification using a trained model [100]. It is a conventional approach to ML for diagnosing lung diseases using imaging modalities.

Publicly accessible datasets

In the modern world, data is far too important. According to one of the studies of digital health records, it was discovered that around 25 million images were subject to cyberattacks [101]. Assume that the European Union (EU) has enacted special regulations for data protection. The General Data Protection Regulation (GDPR) is a form of legislation that updates and unifies data privacy rules across the EU and its associated businesses. Due to GDPR in the EU, hospitals and other healthcare organizations cannot share data [102]. Data sharing for research and other specific purposes is limited, encouraging private or commercial data use.

In contrast to private or commercially supplied datasets, which are not openly accessible to the research community, publicly available datasets are preferable since they are accessible to all researchers and can be used for their studies. The imaging modality appropriate for the particular lung disease must be ascertained first. Certain lung disorders are diagnosed using imaging techniques such as X-rays, CT scans, SSMI, PET scans, MRIs, and others as specified earlier [103]. A dataset must be compiled based on specific images, which may be either public or private. A researcher may collect or create private datasets depending on the research demands. However, a researcher or organization may also provide publicly available datasets if they wish to make their findings public. Researchers developing ML models must access such a vast dataset of these modalities [104].

Preprocessing

Preprocessing the dataset is essential after choosing a particular image dataset. An image dataset's description, visualization, and other attributes can all be used for analysis. It is necessary for the exploration to collect relevant image data for the ML model of lung illness. The ML model heavily depends on image quality for training. Dealing with real-world imaging data requires a more in-depth examination of the data collection process. Several images may need clarification, including incomplete annotations, anomalies, and nonsensical image data within the obtained image dataset. It is challenging to clean and preprocess image data received from databases correctly. Hence, adapting or implementing appropriate preprocessing techniques is necessary [105].

Image enhancement and optimization may be done using ML-based image processing [106]. Approaches to image processing based on AI can lessen the amount of time needed for the process while improving image processing techniques. When preprocessing an image, it can be transformed into a grayscale and cleaned up with Gaussian blur, median filters, morphological smoothing, and numerous other methods [107]. Contrast Limited Adaptive Histogram Equalization (CLAHE) is one of the famous techniques that can be employed to improve the image's contrast [108]. Image processing techniques like lung segmentation, which necessitates the exclusion of bone, might be used to locate the region of interest, after which lung disease detection could be carried out in the region of interest [109].

Feature extraction and relevant feature selection

Certain extracted features may be valuable, while others will not. That ultimately leads to the identification of relevant components. ML algorithms or Classifiers process these features selected for analysis. The feature engineering method consists of two segments: the first aims to extract parts from an existing image dataset, and the second involves picking features among the extracted ones. Methods like Gabor, Zernike, Haralick, and Tamura were used to extract features [110]. Features may be selected using techniques like the gray level co-occurrence matrices (GLCM), local binary pattern (LBP), and CNN. The bio-inspired algorithms such as the improvised crow search algorithm (ICSA), the improvised grey wolf algorithm (IGWA), and the improvised cuttlefish algorithm (ICFA) are all examples of feature selection algorithms that can be used to narrow down a large number of acquired features to only the most desirable ones. Genetic algorithms can also choose diagnostic imaging features [111].

Training of the machine learning model

ML model training is the primary process of the ML pathway, providing an effective model for assessment, verification, and distribution. The ML model has been trained with the help of the relevant available data and can be used to analyze newly collected data and provide predictions utilizing the model [10].

Following the partitioning of the image database, one segment is expected to be set aside for the training phase of the ML model and another for the testing phase. The test data consists of novel data that will be employed in the future to assess the effectiveness of the ML model. Knowing the significance of training in ML will enable the system to collect the appropriate volume and quality of training data for the model. Once the system knows how it affects model prediction and why it's essential, it can choose the optimal algorithm based on the availability and suitability of the training data set [112].

Machine learning and its algorithms

The ML algorithm enables the ML model to perceive the input data in a particular manner. The training process is the sole method that interoperates with ML algorithms so that ML models can extract meaningful information from learning data. It might take time to find an algorithm that works well and is set up to meet the needs of the intended use in a particular domain. Distinct learning algorithms have different objectives, and their results may vary based on data features. So, it's essential to know about machine learning algorithms and how they work in the real world, such as in medicine and other fields [113].

There are many different kinds of ML algorithms. Some are based on regression, decision trees, the Bayesian method, the kernel method, the clustering method, the ensemble method, and artificial neural networks (ANNs) [105].

  • Regression is a common technique for reducing model-based uncertainty by iteratively adjusting the model in response to the errors it produces. Some types are linear, logistic, stepwise, and multivariate adaptive regression splines (MARS).

  • To predict the target variable based on the input variables, an algorithm in the form of a decision tree is utilized. Some examples are random forest, classification and regression tree (CART).

  • Those algorithms that are based on the Bayesian technique are the ones that use the Bayes theorem and make it easier to use subjective probability in model development. The significant algorithms used for classification and regression problems are Nave Bayes and Bayesian Belief Network.

  • Pattern analysis is the basis of the kernel approach, which incorporates a wide range of mapping methods. Support vector machines (SVM) and linear discriminant analysis (LDA) are essential kernel approaches in ML modeling.

  • By grouping data points according to their similarities, clustering is the most widely used unsupervised learning approach. K-Means, partitioning-based, hierarchical, and density-based clustering are just a few examples of clustering techniques that may be classified in various ways.

  • Ensemble methods are strategies that work on several models and unite them to obtain more accurate outcomes. Compared to relying on a single model, the results of ensemble techniques are often more reliable. Bagging, boosting, AdaBoost, gradient boosting machine, and random forest are prominent ensemble techniques.

  • Simulations on a computer based on biological principles are used for various purposes, including clustering and classification. There are many ways to use ANN, such as the perceptron, the Hopfield network, and backpropagation.

Performance metrics

Building an ML model is not sufficient; the evaluation of the build model is to ensure its reliability and forecasting. Performance metrics are a set of statistics used to assess an ML model's overall efficacy and efficiency. These metrics can be quantitative or qualitative, and they can evaluate many aspects of performance. Typically, they oversee improvement and progression over time [114]. The majority of researchers, while conducting their studies, make use of a range of vital metrics, some of which are as follows:

  • Accuracy: The accuracy of an ML model is measured as the proportion of correctly classified samples to the total samples. It is the most common metric used to measure the performance of an ML model. It can be expressed as (Eq. 1):

    $$\mathrm{Accuracy\,}=(correctly\,classified\,samples)\,/\,(Total samples)$$

The correctly classified samples can be expressed as follows:

$$\mathrm{correctly\,classified\,samples\,}=\,True\,Positive\,\left(TP\right)+\,True\,Negative\,(TN)$$

The total samples can be expressed as follows:

$$\mathrm{Total\,sample\,}=\,TP\,+\,False\,Positive\,(FP)\,+\,TN\,+\,False\,Negative\,(FN)$$
  • Sensitivity: This metric measures how many relevant samples an ML model can identify by calculating the proportion of true positives to all actual positives and presented through Eq. 2. It is often called the "true positive rate" and the "recall."

  • Precision: This metric measures how accurate a model's predictions are by calculating the ratio of true positives to all positive predictions made by the model. It is often referred to as "positive predictive value" and is presented through Eq. 3.

  • Specificity: It measures how well a model can correctly identify negative samples. It is the ratio of true negatives that are correctly identified and presented through Eq. 4. An ML model with high specificity may have a low false-positive rate, meaning it will rarely incorrectly classify negative examples as positive.

  • F1 Score: This amalgamation of precision and recall scores provides an overall score for model evaluation. The F1 score is presented in Eq. 5.

  • AUC: AUC stands for Area Under the Receiver Operating Characteristic Curve. For varied thresholds, AUC graphs the actual positive rate versus the false positive rate, which is used to evaluate a model's ability. The AUC represents the degree of discrimination between classes [115]. Some of the performance metrics are presented in Table 4.

Classification of lung diseases

Classification identifies, comprehends, and groups objects and concepts into predetermined categories. The act of classifying something is pattern recognition. Classification is a specific type that predicts a class label for a given sample Table 4.

Table 4 Performance metrics

It transforms a function from input to output variables as a target, label, or class. "binary classification" describes classification tasks with just two possible class labels. Classification problems with more than two categories are called "multiclass classification." Some of the algorithms developed for binary classification can also address multiclass concerns [105].

ML sub-fields

Numerous prominent sub-fields of ML may be utilized to diagnose lung diseases. Deep learning (DL), CNN, ensemble techniques, transfer learning, and many other notable ML subfields may be used to diagnose lung conditions. Many more subfields of ML can also be employed. The focus here is on elaborating on a few particularly notable sub-fields.

Deep learning

A popular and rapidly developing area of ML is DL. Learning A popular and rapidly developing area of ML is DL. Learning from massive datasets is the focus of DL, a subfield of ML that employs neural networks. DL enables the creation of diagnostic models by performing all the processing steps typically associated with the construction of standard ML models, such as feature extraction and selection, in an automated manner. The word "deep" signifies that many hidden layers comprise the neural network. There is a particular set of neurons in the processing layers of neural networks for deep learning. The first layer in a network is known as the input layer, the final layer is known as the output layer, and the layers in between are known as the hidden layers [116]. DL has been influential in diagnostic imaging for feature engineering and image classification [117] and can resolve data-related problems with minimal supervision. It has consequently prompted researchers to research DL approaches at deeper levels. DL algorithms do exceptionally well compared to conventional differential diagnosis screening processes that rely solely on radiologists [118].

Consequently, DL offers novel models for classification tasks and medical image diagnostics [119], which achieve excellent results. In particular, DL approaches are anticipated to aid physicians in the examination and diagnosis processes [120]. DL leverages ANN to examine raw data directly. Multilayer perceptrons (MLP) also comprise the most prevalent deep learning algorithms.

Three primary groups of DL approaches are supervised, unsupervised, and semi-supervised. Several supervised learning approaches include CNN, deep neural networks (DNN), and recurrent neural networks (RNN). DL excelled in non-linear dimensionality reduction and clustering problems in unsupervised learning. It comprises limited Boltzmann machines, auto-encoders, and generative adversarial networks (GANs). Semi-supervised deep understanding also includes GAN. In addition, RNNs, which contain GRUs and LSTM techniques, could be applied to all ML strategies, such as supervised and unsupervised learning [121].

A decade-long comparison of the search volumes for "Machine Learning" vs. "Deep Learning". Figure 13 depicts the Google Trends queries performed between 2012, and 2023. Results indicate that ML searches predominate over DL searches due to their use as an umbrella term [122].

Fig. 13
figure 13

Machine learning and deep learning searched the internet internationally

Convolutional neural network

CNNs were implemented in several domains, including computer vision and medical imaging. In particular, CNNs have been effective at producing outputs in previously unattainable settings [123]. It is the case since CNNs can detect and learn crucial traits that radiologists cannot readily observe with visual inspection [124]. CNN's primary advantage over its earlier works is that it intelligently recognizes pertinent features. There are many advantages to utilizing CNNs, including the feature of weight sharing, simultaneously learning both the feature extraction and the classification, and the capability to create large-scale networks [121]. The basic architecture of CNN is represented in Fig. 14.

Fig. 14
figure 14

Basic architecture of CNN

Convolutional layer

The convolutional layer comprises a procedure that involves repeating a specific filter over the whole image. The incoming image (i) of every layer in the model of CNN is presented in three dimensions: height, width, and depth, represented as a × a × b in the dimensional form, in which the height (a) is the same as the width (a). A different name for depth (b) is the channel number.

Filters may have a variety of sizes, including 3x3, 5x5, 11x11, etc. Filters convolutionally transform the preceding layer's inputs into the corresponding layer's output. A feature map is produced as a result of this convolution procedure.

k is the number of kernels, also known as filters, contained within every convolutional layer with the same dimensional form as the input image, represented as c × c × d, with the following conditions: c < a, and b <  = d. A dot product is computed between the inputs of the convolution layer and the weights of that layer. To generate k feature maps (hk) as presented in Eq. 6, input is convolved with these kernels, which all have the same bias (bk) and weight (wk) [121, 125].

$${h}^{k}=f\left({w}^{k}*i+{b}^{k}\right)$$
(6)

Activation functions

All activation functions in neural networks that deal with non-linearity map input to output. The input value is calculated by weighting the neuron input and adjusting for bias. CNN and other types of deep neural networks often use the Relu, Leaky Relu, and Noisy Relu, as well as the Sigmoid and Tanh activation functions. An activation function that may prevent vanishing gradients is the rectified linear unit (ReLU). This interpretation focuses on the argument's positive axes [121]. Some of the prominent activation functions that are widely used are presented in Table 5.

Table 5 Prominent activation function

Pooling layer

A down-sampling operation must be done on each feature map in a pooling or subsampling layer. A pooling layer is characterized by a formation that preserves the image features while simultaneously reducing the image size. Additionally, it stores image information. This subsequent step is to use a pooling function, such as maximum, global, or average, with a kernel size or pool size that has already been set for each of the feature maps [125].

Optimizers

Updating the weights in the CNN architecture requires employing optimization algorithms at each level until it is possible to get the maximum learning. The updating procedure is carried out by each approach using its unique algorithm. Some of the best-known optimizers are called Gradient Descent, Stochastic Gradient Descent, and Adam [125].

Fully connected layer

It is a layer in which every precomputed input node is coupled to every output node. It is a layer utilized to make predictions at the network's end. This layer connects each neuron of the preceding layer to each neuron of the current layer. The previous layer's output is flattened and delivered to a fully connected layer that linearly modifies the data before sending it to a nonlinear activation function [128].

CNN architectures

Various CNN architectures carry out classification tasks, including ResNet, VGG Net, Inception, Xception, DenseNet, EfficientNet, MobilenetV2, and many more. On the other hand, segmentation tasks are carried out by U-Net, V-Net, FCN, SegNet, DRUNET, and many different architectures [129]. With the aid of CNN, the number of parameters can be significantly reduced, overfitting can be prevented, and the information gleaned from an image may be preserved.

Ensemble learning

Ensemble learning aims to improve general performance by integrating different models into a single one. It was initially proposed for classification tasks. The benefits of both deep learning and ensemble learning are combined in deep ensemble learning models to provide a model with enhanced performance [130]. An ensemble of learned models may be created by taking the training data, deriving many training sets from it, learning a model from each, and then combining them. The bagging, boosting, and stacking methods are all well-known ensemble learning methods. The result of combining model outputs is a single prediction. A weighted vote facilitates classification, whereas a weighted average reduces numerical prediction. This approach is used by bagging and boosting; however, their respective models are generated uniquely [131]. Stacking enables the combination of fundamental learning algorithms. Diversified foundation models allow the stacked ensemble to learn from various perspectives, producing heterogeneous features. The super learner approach is called "layered ensemble learning" [132].

Transfer learning

ML approaches only function when testing and training data are from the same feature space and dispersion. Statistical models must be reconstructed with fresh training data when the dispersion changes. In many instances, based on the real world, retrieving data for training and recreating models is either impractical or too expensive. It would be helpful to reduce training data collection work. In certain circumstances, transfer learning across task domains is advantageous. Whenever there is inadequate standard training data for a given job, one solution is to use transfer learning methods to bring the knowledge acquired from previously experienced tasks to the target job [133]. Inductive [134] and transductive kinds of transfer learning are preferred for classification or regression studies. On the other hand, unsupervised types of transfer learning are selected when it comes to tasks involving clustering and dimensionality reduction [135]. Transfer learning made the DL model even more accurate by fine-tuning it with more training data and adjusting the parameters.

Detection of prominent lung diseases using machine learning and imaging

The backbone of ML models is input data, which comes in the form of datasets and ML diagnostic methods. Therefore, at first, the primary emphasis of this review was on the datasets that were given for the prominent lung diseases, and the subsequent section discussed the ML approach for the diagnosis in more depth.

Publicly accessible datasets

Pneumonia

To initially address the issue of accessing image data, public datasets are preferred and represented since virtually everyone can access them, which makes them ideal for conducting research. This section summarizes the publicly available pneumonia datasets used in the reviewed study to provide readers with relevant data for the datasets on pneumonia. The datasets for the diagnosis of pneumonia that are publicly available are listed in Table 6.

Table 6 Available pneumonic datasets

Access to private databases, which are often commercial and need authorization, is restricted. Publicly available datasets for prominent lung illnesses are presented [136]. Images of both pneumonia and healthy lungs can be found in the LDOCTCXR (http://data.mendeley.com/datasets/rscbjbr9sj/3) [42, 137] and RSNA pneumonia databases (https://www.kaggle.com/competitions/rsna-pneumonia-detection-challenge/data) [43].

The ChestX-ray8 dataset (https://www.kaggle.com/datasets/nih-chest-xrays/data) classifies eight lung diseases, such as pneumonia [44, 45], while the ChestX-ray14 dataset (https://www.v7labs.com/open-datasets/chestx-ray14) classifies 14 lung diseases using the same X-rays [46]. Researchers conducted a retrospective analysis on 155 patients with COVID-19 pneumonia treated with chest computed tomography in Rio de Janeiro, Brazil, between March and May 2020 (https://doi.org/10.1590/0100-3984.2020.0133) [63].

According to the study, COVID-19 stimulates a distinct sort of pneumonia patients have discovered (https://doi.org/10.17632/9xkhgts2s6.3) [47]. A specific kind of pneumonia known as viral pneumonia is discovered and recorded in this dataset (https://www.kaggle.com/datasets/tr1gg3rtrash/balanced-augmented-covid-cxr-dataset) [48]. According to the presented study, COVID-19 stimulates a distinct sort of pneumonia patients have found [47]. A specific kind of pneumonia known as viral pneumonia is discovered and recorded in this dataset [48].

Radiologist-labeled reference standard assessment sets and uncertainty labels are characteristics of CheXpert. The researchers evaluated various ways of addressing uncertainty and verified them on the assessment sets. The dataset includes 65,240 patients' chest radiographs, totaling around 2.5 million, that have been annotated for the presence of 14 chest radiographic findings. It has a labeler that can gather observations from free text radiological reports and use an uncertainty label to identify any uncertainties (https://doi.org/10.48550/arXiv.1901.07031) [49]. 65,379 patients' X-ray scans are included in the 377,110 image MIMIC-CXR dataset. It comprises 253,714 frontal and 123,246 lateral view images (https://doi.org/10.1038/s41597-019-0322-0) [50]. An open dataset of chest X-rays with radiologist annotations is called VinDr-CXR [52].

VinDr-CXR is a massive dataset with labels and more than 18,000 chest X-ray scan visuals made accessible to the public in DICOM format. All data, including images and findings, has been de-identified to safeguard patient privacy in the dataset. It comprises 715 pneumonia samples, accounting for 0.0397% of the dataset. Radiologists assigned labels to this dataset (https://doi.org/10.1038/s41597-022-01498-w.) [138]. Ground-truth lung segmentation masks are included with the complete COVID-QU-Ex dataset (https://www.kaggle.com/datasets/anasmohammedtahir/covidqu) [53].

Lung cancer

The reviewed study used databases for lung cancer that were open to the public to provide readers with pertinent information. The datasets for the diagnosis of lung cancer that are publicly available are listed in Table 7.

Table 7 Available lung cancerous datasets

While using the same X-ray instances, the ChestX-ray8 dataset classifies eight lung diseases, including the detection of lung nodules [44], while the ChectX-ray14 dataset classifies 14 lung disorders [46]. VinDr-CXR comprises 586 lung nodule samples, accounting for 0.0325% of the dataset [138]. The LUNGx Challenge will provide participants with a slight opportunity to contrast their diagnostic classification methods for 73 benign and malignant lung nodules (https://doi.org/10.7937/K9/TCIA.2015.UZLSU3FL) [64]. The Japan Society of Radiological Technology has generated a dataset for lung nodule image classification (http://imgcom.jsrt.or.jp/minijsrtdb/) [56].

The NLST CT scan image collection, which comprises over 200,000 image series from 75,000 CT tests, was compiled by more than 25,000 individuals. The cancer data access system (CDAS) provided access to a subset of lung cancer images that contained around 28,000 images from approximately 3,700 individuals (https://cdas.cancer.gov/learn/nlst/images/) [65]. Four hundred twenty-two individuals with NSCLC are featured in the collection of images. A prognostic radiomic signature was created using the dataset (Lung1) that was reported (https://doi.org/10.7937/K9/TCIA.2015.PF0M9REIhttps://doi.org/10.7937/K9/TCIA.2015.PF0M9REI) [66]. Imaging data from the cancer moonshot biobank (CMB) is being made accessible in conjunction with the release of clinical and genetic data from the CMB initiative. CMB is a program of the National Cancer Institute that supports ongoing and upcoming studies into cancer research programs (https://doi.org/10.7937/3CX3-S132) [67]. Lung cancer patients underwent a variety of diagnostic procedures, including an exhale or inhale breath-hold CT (BHCT), free-breathing four-dimensional CT (4DCT), and Galligas PET ventilation (https://doi.org/10.7937/3ppx-7s22) [68]. The lung cancer patient's CT and PET-CT DICOM images are included in the database and XML annotation records (https://doi.org/10.7937/TCIA.2020.NNC2-0461) [69]. Patients' CT scans were collected at the H. Lee Moffitt Cancer and Research Institute, which had NSCLC with a mix of stages and histology, and the QIN associates received the data for research objectives (https://doi.org/10.7937/K9/TCIA.2015.NPGZYZBZ) [70]. Images collected during chemoradiotherapy for 20 patients with locally advanced NSCLC are included (https://doi.org/10.7937/K9/TCIA.2016.ELN8YGLE) [71].

Assessing a Variety of Malignant, Unidimensional, Bidimensional, and Volumetric Parameters with CT Scans in Lung Cancer Patients, a collection of lung CT scans called the reference image database to evaluate therapy response (RIDER) was produced (https://doi.org/10.7937/k9/tcia.2015.u1×8a5nr) [72]. To simplify the operations of the RIDER PET/CT subgroup, the RIDER lung PET/CT collection was shared (https://doi.org/10.7937/K9/TCIA.2015.OFIP7TVM) [73].

COVID-19

The datasets for the diagnosis of COVID-19 that are publicly available are listed in Table 8. The creators integrated 15 publicly available COVID-19 chest X-ray image datasets to build the curated COVID-19 posterior-anterior lung radiography imaging database [47]. Its four categories were the balanced augmented COVID CXR dataset, COVID-19, viral pneumonia, lung opacity, and normal.

Table 8 Available COVID-19 datasets

It is a publicly available, significantly imbalanced chest X-ray dataset [48]. COVID-QU-Ex is the most comprehensive lung mask dataset ever created [53]. Combining multiple publicly accessible datasets created the COVID-19 detection dataset [54, 55]. Images from various organ locations and modalities are included in the collection (i.e., CXRs, CT scans, MRIs). For each patient, this collection comprises clinical information. Diagnoses, procedures, laboratory testing, and COVID-19-specific data values include clinical information [57]. A sample was taken within a day after the initial CT, resulting in a positive RT-PCR for SARS-CoV-2 in each subject. Conducted CT scans without contrast and converted DICOM images to NIfTI format (https://doi.org/10.7937/TCIA.2020.GQRY-NC81) [74]. Across all COVID-positive thoracic CT imaging studies, pixel-level volumetric segmentation, including diagnostic captions by thoracic radiography general practitioners, was performed. This system of labels was put together with the help of other global consensus panels and COVID data annotation efforts (https://doi.org/10.7937/VTW4-X588) [75]. 120 CT images of COVID-negative patients from four global sites make up the RSNA international COVID-19 open annotated radiology database (RICORD) version 1b. It gives access to a particular class of COVID-negative image collections (https://doi.org/10.7937/31V8-4A40) [76]. Radiology subspecialists clinically annotated all COVID-positive X-ray studies using a labeling system based on COVID-19 reporting rules ( https://doi.org/10.7937/91ah-v663) [58]. The COVID-19-AR dataset has genome data and CT scans to understand better COVID-19 (https://doi.org/10.7937/tcia.2020.py71-5978) [59, 139]. The COVID-XRay-5K dataset was produced using data gathered from two origins: The ChexPert Dataset is used for non-COVID or COVID-19 negative X-ray samples, whereas the Covid-Chestxray-Dataset is for COVID-19 positive X-ray samples (https://github.com/shervinmin/DeepCovid) [60]. In the COVID-CT collection, 4,63 patient CT scans are not included in the COVID-19 research. In addition, the COVID-19 collection contains 3,49 CT scans from participants who participated in the COVID-19 research (https://doi.org/10.48550/arXiv.2003.13865) [78].

Machine learning in pneumonia detection

An investigation of the several methodologies presently used for diagnosis and forecasting using a combination of ML and imaging methods is presented. Researchers from many areas, including ML and the medical sector, have looked at diagnosing and forecasting pneumonia.

The information was compiled from the final collection of articles describing the many sorts of ML techniques used and their findings, which are presented in Table 9.

Table 9 Machine learning and sub-fields in pneumonia diagnosis

The dropout convolutional network proposed by Szepesi et al. was trained and evaluated on 5856 tagged images. A convolutional layer with a unique dropout was part of the proposed architecture, along with a batch normalization layer, an activation layer, and a pooling layer. The researchers evaluated the test performance of the proposed model at several different dropout rates, including 10%, 20%, 30%, 40%, and 50%, and the results showed that the 40% dropout rate was the most successful. Their retrospective analysis included one-to-five-year-old children with anterior–posterior (AP) X-rays [140].

Twelve ML models had already undergone training —AlexNet, DenseNet, GoogleNet, MnasNet, MobileNetv2, MobileNetv3, ResNet50, ResNeXt, ShuffleNet, SqueezeNet, VGG16, and Wide ResNet50—were modified and used to predict X-rays of healthy people and those with pneumonia symptoms that could be caused by either a virus or bacteria. It was done to distinguish between healthy people and those who could have pneumonia symptoms caused by viral or bacterial agents. To provide an informative analysis of model classification, we presented additional experiments to evaluate the resilience of each model. These experiments utilized 50%, 20%, and 10% of the training data. It gave an average f1-score of 84.46% when trying to tell the difference between the four classes [141].

Multi-branch fusion auxiliary learning (MBFAL) is a suggested approach for analyzing CXR images to diagnose pneumonia. The proposed MBFAL approach is comprised of ResNet34 and ResNet18, which were previously trained on the ImageNet dataset. The training was conducted using the ResNet18 and ResNet34 networks, the auxiliary learning method, the prior-attention residual learning (PARL) network, and the MBFAL technique. This technique is based on supplementary learning and verifying fit sets using an auxiliary database. This is performed in combination with the PARL structure and feature fusion approach. A multi-branch CNN achieved classification, and the fusion of losses during network training involved using an MLP [142].

Based on Condorcet’s Jury Theorem (CJT), the unique method calculated classifier voting ensemble scores. The studies showed, with the assistance of CJT, that including a model in the pool of voters would increase the chance that the majority vote would be correct if the model in question were more accurate than the other models in the pool. In addition to this, a different unique domain extended transfer learning (DETL) ensemble classifier was constructed as a soft voting ensemble technique. This model has been compared against a CJT-based ensemble classifier to determine which is superior. Because of the large number of classifier votes in ensemble learning, it is necessary to consider each vote and significant voting. The winning class in majority voting is the one with the most votes. However, a higher number of votes does not necessarily increase the chances that the final verdict will be correct [143].

A portable, quick thermal imaging system proposed with image-processing algorithms and ML analysis for pneumonia diagnosis. A smartphone-attached portable thermal imager recorded RGB and infrared images from the back of each issue. Pneumonia patients' back lung mapping skin temperature increased substantially, which may help diagnose them. The obtained images were then automatically processed to extract several spatial and structural attributes that can accurately differentiate between normal individuals and patients suffering from pneumonia. The procedure for detection is as follows: determining the highest temperature in each thermal image indicating the pulmonary area on the accompanying RGB image, Identifying the spot on the thermal image after obtaining the temperature in the area of overlap, Calculating the high-temperature indices Utilizing principal component analysis (PCA) to analyze the indices. In addition, thermal imaging was used for the diagnosis and treatment evaluation of pneumonia in this investigation [144].

The Hybrid Social Group Optimization (HSGO) method extracted relevant and critical features from CXR images. Several classifiers categorized CXR images. The social group optimization (SGO) approach with enhancements, HSGO, chooses the optimal features from a feature collection. A wrapper-based method enables HSGO to locate the optimal feature set more efficiently [145].

In conjunction with image augmentation, transfer learning is employed in training and validating multiple pre-trained deep CNNs. The neural networks were learned to categorize using two distinct methods: first, binary classification, and second, multi-classification with and without image augmentation. The performance of deep networks was demonstrated to be superior to that of shallow networks when both types of networks were trained using image augmentation. Image augmentation training showed that DenseNet201 outperformed other CNN networks. DenseNet-based CheXNet outperformed other networks without image augmentation. Deeper DenseNet supersedes CheXNet on a huge augmented dataset [146].

The multi-scale attention network (MSANet) approach may automatically prioritize unique statistical features and multi-scale characteristics of pneumonia detection to enhance classification. Four modules—lung segmentation, spatial pyramid decomposition, multi-scale feature extraction, and classification—make up this approach. Community-acquired pneumonia (CCAP) dataset is a public, multiclass CT scan dataset that includes four different types of pneumonia [147].

Combining the capabilities of Ensemble CNN with the Transformer Encoder method produces the proposed fusion methodology. Ensemble A hybridizes DenseNet201, VGG16, and GoogleNet, whereas Ensemble B is a hybridization of DenseNet201, InceptionResNetV2, and Xception. The ensemble backbone retrieves significant features from the input X-ray images using two independent ensemble methods. On the other hand, the MLP self-attention mechanism is used to make the Transformer Encoder for accurate diagnosis [148].

The specified research aimed to develop and assess CNNs for identifying pneumonia based on CXR images with varying image noise levels. Six classification tasks were designed for five levels of Gaussian noise. The images had Gaussian noise added to them with a zero mean, and there were five different levels of image noise variance, which corresponded to reducing exposure levels. CNN's analysis of the various datasets found no significant loss in performance when comparing the original input dataset to the five datasets with varying noise levels [149].

Li and Li created a new voting technique to combine 17 CNNs and use them to construct our AI models for data fitness optimization to prove that the 17-CNN approach is better than any individual CNN approach. Classifier A compares patients with pneumonia to those without; classifier B contrasts viruses and bacteria; classifier C differentiates between COVID-19 and other viruses; classifier D does the same for COVID-19 and bacteria; and classifier E compares COVID-19 and healthy individuals. To use transfer learning, CNNs are kept the same during the first training on the secondary domain. Only the layers that come after that are changed [150].

The model that is being proposed is a combination of a CNN and explainable AI. Grad-CAM, LIME, and SHAP are used to analyze and describe the information for more understanding. The extraction of convolutional features is used to gather high-level, object-based data. Next, the CNN model's black-box technique is assessed utilizing shapely information from SHAP, predictive results from LIME, and a heat map from Grad-CAM [151].

A two-step ML-based diagnostic and predictive model was designed. Lungs were segmented using DL-based segmentation. One hundred seven features were retrieved, including contour, histograms, and high-order texture features, and accompanied by various methods for selecting features, which were also utilized. GLCM, GLRLM, GLDM, GLSZM, and NGTDM were used to compute the features. The classifications of pneumonia, COVID-19, and healthy and severe, moderate, and mild score indices were calculated using random forest and meta-voting [152].

Five architectures for deep learning ResNet-50, ResNet-50r, DenseNet-121, MobileNet-v3, and CaiT-24-XXS-224 (CaiT) transformers are used for transfer learning. Researchers conducted twenty examinations with ten repeats, evaluating the classifiers' efficiency by applying the Friedman-Nemenyi test. The boot-strapping method was used to make confidence intervals, and then the Friedman–Nemenyi paired post hoc test was used to compare models. ResNet-50 architectures are statistically guaranteed to be robust enough to diagnose pneumonia in a multiclass environment [153].

Machine learning in lung cancer detection 

Throughout this part, researchers have investigated the various techniques or procedures currently employed for identifying lung cancer, and these approaches are addressed. The findings of research studies examining the identification and prediction of lung cancer are summarized in Table 10.

Table 10 Machine learning and sub-fields in lung cancer

Researchers constructed three distinct hierarchical deep-fusion learning models to identify lung nodules from CT scans. The completed model includes MPF, SFMPF, and MFMPF, which stand for multi-perspective, single-feature, and multi-feature, respectively. The MPF model has three hierarchical classification levels based on multi-perspective deep fusion. SFMPF is a model for image-feature-based hierarchical deep fusion learning. Using bilateral, trilateral, Gabor, and LOG-filtered images, four distinct feature-image-based model architectures are investigated. Combining the outputs of the four SFMPF models yields the MFMPF [154].

Images from CT scans are preprocessed to improve quality. Next, the lung nodule regions are segmented using a random walker algorithm based on user-provided seeds. Then, the LBP and the Riesz wavelet transform are used to collect the intensities and texture features. The improved gradient boost classification model was developed and evaluated to identify nodules as malignant or benign using the managed features [155].

The identification of lung nodules in CT images has been reported using statistical and shape-based parameters. Lung segmentation was achieved using a histogram-based threshold approximation approach. Extraction of nodule features utilizing statistical and shape-based techniques and an algorithm for detecting round or almost round shapes to identify circular ones. For processing purposes, DICOM images are converted to PNG format. DICOM is a storage and transmission standard for medical images. Digital images that may result in image quality deterioration The testing phase of the SVM classifier produced superior results [156].

The 121-layer CNN, DenseNet-121, and the transfer learning scheme are potential classification methods. Transfer learning was used and considered due to the issue of a minimal dataset in the JSRT dataset. The first way to classify transfer learning is based on whether or not it involves nodule formation. The next thing that needs to be done is to ascertain whether or not the nodule in concern is malignant [157].

The CT scan was manually segmented and then analyzed using a convolutional neural network. Even though the segmentation results based on DeepLab v3 and VGG-19 are better than those of the artificial segmentation, the testing revealed that both SegNet and the artificial segmentation findings are the nearest to the benchmark and almost overlap. Pathological evaluation revealed that 120 patients had benign lung nodules, whereas the same number of patients had benign lung nodules discovered by SegNet within the same period [158].

The suggested Block-PP employed morphological processes in conjunction with fuzzy logic to complete the lung segmentation. The SURF approach and the genetic algorithm are used in conjunction with the suggested Block FE–O to carry out the processes of feature extraction and optimization, respectively. The optimized or chosen feature set was then transmitted to the proposed Block-HB using the SVM and the feed-forward-back-propagation-neural-network (FFBPNN) [159].

Using the DL architecture for multiclass classification that was created, several illnesses, including pneumonia, were categorized. For classification, a VGG19 model that had already been trained was used. After that, three blocks of CNN were used to pull out features, and a fully connected layer was used for classification [160].

CT scan images were employed in the training process of a lung cancer prediction CNN (LCP-CNN) that had been developed to assign a malignancy score to each pulmonary nodule. Training for the LCP-CNN was carried out with the assistance of the NLST dataset. The LCP-CNN rule-out test was developed to determine benign nodules while keeping a high degree of sensitivity intact. This was accomplished by using malignancy score thresholds. During the procedure of defining the rule-out criteria, an eight-fold cross-validation method was employed [161].

The presented method consists of four stages: first, image preprocessing using the Gabor and Kuwahara filters.

Secondly, image segmentation was accomplished using Chan-Vese active contour modeling to exclude minor perturbations to previously discovered nodules, like small fragments wrongly identified as nodules. In this instance, little nodules were found by segmenting the lung region using a region-growing algorithm. The third step was feature extraction, which generated features using the DWT at one, two, and three decomposition levels. Finally, following a comparison of the output features, the polynomial neural network (PNN) categorization algorithm is trained to differentiate benign from malignant nodules based on the output feature that was determined to be the most accurate [162]. A hybrid method was proposed that used CNN models, the transfer learning approach, gray wolf optimization (GWO), and genetic algorithms (GA). A weighted filter was used to minimize the image noise, and an enhanced version of the Gray Wolf Optimization approach was carried out before the segmentation process, along with watershed modification and dilation procedures. The combination of improved Gray Wolf optimization and Inception-V3 (IGWO-IV3) increased overall performance. The IGWO uses GA to locate the most advantageous starting sites for the GWO [163].

A hybrid strategy for characterizing nodules in CT images by combining the features used to identify them with the extension of feed-forward networks. Researchers developed an embedding of nodules that are based on the statistical relevance of features for malignancy identification to reduce the amount of training data that was also required. Leveraging self-defined diagnostic performance measurements, a feed-forward network also optimizes its structure and hyper-parameters [164].

The research endeavored to enhance the quality of images of lung cancer by using and applying various imaging techniques, like image correction, gamma correction, contrast stretching, thresholding, and histogram equalization techniques. Features obtained by the GLCM to improve images and use and refine several robust machine learning classification approaches, like SVM with Gaussian, RBF, and polynomial kernels, decision trees, and naive Bayes [165].

An automated approach to identifying lung nodules using CT image processing methods is presented. The oval or circular form of the lung nodules' two-dimensional shape is used as the basis for the detection approach for the lung nodules. It is feasible to identify a lung nodule using four 2-dimensional features and then classify it using eleven 3-dimensional features. Nodule enhancement is the process of increasing the gray level of nodules. The method was applied to an image, which resulted in the lower brightness level of the image being amplified while the upper brightness level of the image remained unaltered [166].

Effective presentation of image preprocessing techniques such as denoising, thresholding, and morphology. Denoising and thresholding are done using Gaussian blur and Gaussian thresholding, respectively. The provided image is converted to grayscale and de-noised using Gaussian blur for image processing. After that, Otsu's technique and adaptive Gaussian thresholding altered the grayscale image. Form-based morphological procedures were then performed on the image. They also proposed a novel algorithm and image-processing approach. Texture features are retrieved utilizing statistical parameters and GLCM, which are applied to extract features from the segmented images with enhanced quality. A performance evaluation of seven ML-based classifiers for detection and classification is presented [167].

Machine learning in COVID-19 detection

This section examines various COVID-19 diagnostic techniques and approaches presently in use. The information shown in Table 11 was derived from a compilation of publications describing the different ML approaches and their results.

Table 11 Machine learning and sub-fields in COVID-19

COVIDNet is a deep CNN designed to detect COVID-19 in lung X-rays. They created the COVIDx dataset, which consists of five datasets that are accessible online. The projected COVID-Net had already been trained on the ImageNet and then trained on the COVIDx dataset. Training settings included a learning rate 2e4, 22 epochs, 64 batches, a factor of 0.7, and a patience setting of 5. The neural network architecture provided by the COVID-Net framework is the only one of its type to provide a compact projection-expansion-projection-extension (PEPX) architecture.

This architecture improves representational capacity while significantly reducing computational complexity [168].

Two diagnostic inference engines, COV19-CNNet and COV19-ResNet, are employed for COVID-19 diagnosis. Both architectures were developed from scratch without the use of a pre-trained DL model. AI-based inference engines can transform X-ray equipment into valuable testing equipment for diagnosing COVID-19 using specified DL methods. In contrast to earlier research in the area, inference engines were constructed from the ground up, utilizing novel deep neural networks and no preexisting systems. COV19-CNNet and COV19-ResNet are the two engine designations. The COV19-CNNet employs a CNN architecture, whereas the COV19-ResNet employs a ResNet structure. They focused their study on the complexity of classifying COVID-19 into multiple groups [169].

Transfer learning and classification utilizing a linear SVM classifier and MobileNet architecture accomplish automatic X-ray image detection. Images of healthy individuals were used for datasets A and B, but COVID-19 images remained unchanged. Multiple CNN architectures can extract features from X-ray images due to subsequent training on ImageNet. CNNs are combined with MLP, KNN, and Naïve Bayes [170].

Image enhancement, segmentation, a customized stacking ensemble model with four CNN base-learners (DenseNet-121, DenseNet-169, VGG-16, VGG-19, and ResNet-50), and Naive Bayes as a meta-learner are all part of the "COVIDScreen" developed model for classifying lung X-rays. After the preprocessing stage, which included histogram equalization with CLAHE and image segmentation with U-Net techniques [171], the dataset was 6% more accurate.

The researchers conducted four class classifications (Normal, COVID-19, Pneumonia Bacterial, and Pneumonia Viral) on various prepared datasets by using the suggested CoroNet model. Additionally, they did three class classifications of "normal," "COVID-19," and "pneumonia" on these datasets. The "CoroNet" suggested model was built on top of the Xception CNN architecture as its primary building block. The Inception design was extended to 71 layers to create the Xception architecture [172].

CNN was used to perform a two-phase X-ray image analysis process known as "XCOVNet" for COVID-19 detection. During the first step, the collection of X-ray pictures, of which fifty percent are positive for COVID-19 and the other fifty percent are normal, was preprocessed. In the second step, the neural network model was trained and fine-tuned to attain a classification accuracy of 98.44 percent. In this investigation, researchers used two chest X-ray imaging collections: Dataset-1 consists of 950 CXR images annotated with more than fifteen various types of illness discoveries with 196 COVID-19 CXRs. In contrast, Dataset-2 consists of 5856 CXR images with 1,583 COVID-19 CXRs classified as bacterial, viral, and normal pneumonia [173].

The researchers classified COVID-19 using a graphical user interface (GUI) tool they designed. They used many CNN models, including DenseNet 201, Resnet 50 V2, and Inception V3. Each model underwent meticulous instruction so that it would be able to provide accurate forecasts. After that, the technique for assembly is employed to attach the models [174].

The authors' proposed method, known by its acronym CoroDet, is comprised of an original 22-layer (9 Conv2d layers, 9 Maxpool2d layers, one flattened layer, two dense layers, and one leaky ReLu layer) CNN model. Multiple classifications were performed, including two, three, and four classes. During their study, they did 7390 scans in the COVID-R dataset they built [175].

The COVQU dataset consisted of 18479 CXRs of patients with normal lungs, lung capacity abnormalities associated with COVID-19, and lung capacity disorders unrelated to COVID-19. They introduced a modified version of the U-Net network for lung segmentation and classification that uses seven different CNN models: six deep CNN models (ChexNet, DenseNet201, InceptionV3, ResNet101, ResNet50, and ResNet18) and one shallow CNN model [176].

Five distinct CNN models were employed for three binary classifications as part of a deep transfer learning-based strategy. According to the research, the primary advantage of using transfer learning for data training is that it requires fewer data points. ResNet had the most remarkable accuracy of all the trained models in the research. For their investigation, they built multiple datasets using CXR images from several publicly available datasets [177].

The CovidDWNet approach uses a structure built on feature reuse residual blocks and depth-wise dilated convolutional component elements. Both of these components are convolutional in nature. Using the gradient boosting method, we obtained an estimate for the feature maps produced with the assistance of the COVIDWNet architecture. An efficiency increase of almost 7% was realized with the aid of the CovidDWNet + GB architecture in CT scans, while an efficiency improvement of approximately 4% was reached in X-ray imaging [178].

For patient-specific per-slice CT scan analysis, researchers recommended 2D processing. The processing is as follows: Step 1 helped them; 2D ROI segmentation acquired the lungs. Step two evaluates segment conditions using a 2D ROI classifier. Step three uses Grad-Cam, a multi-scale model, to create a localization map. The fourth step integrated all segment localization maps to create a 3D concatenated volume. Step five introduces the Corona-score biomarker and 3D volumetric scoring. Step six determines the severity of the illness. When a case is positive, the system provides a Corona score, used in research to assess severity [179].

Using transfer learning, developers developed a detection system. To achieve a higher level of accuracy, they suggested a stage-based detection strategy that included the following procedures: The first step required the augmentation of data; the second phase made use of a CNN model that had been pre-trained; and the third phase focused on the localization of anomalies in CT scan images [180].

Voting was the basis for a system that research suggested. Images are divided up into their respective categories with the use of a voting process in this approach. One can perform a cross-dataset evaluation to evaluate the robustness of the models by utilizing data from several different distributions [181].

Methodical exploration

The significant concerns still in consideration:

  • Image Dataset Availability: Since there is a need for imaging samples and datasets available, it might be challenging to acquire all of the information necessary to diagnose lung illness accurately.

  • Imbalanced Datasets: Imbalance in the dataset can lead to inaccurate diagnosis, as DL solutions may overfit the majority or minority classes and fail to classify accurately.

  • Quality of Images: Low-resolution or poor-quality images can yield inaccurate results when using ML solutions for lung disease diagnosis.

  • Unreliable data: ML models rely highly on high-quality, consistent data, which can be hard to come by. Poor quality, incomplete, or inconsistent data can lead to an incorrect diagnosis.

  • Bias in data: Healthcare providers must recognize that bias may exist in the data they provide to train the ML models, and they must ensure that these biases are corrected to prevent any false positives or misdiagnoses.

  • Uncontrolled data sources: The image dataset used for ML models may come from multiple sources, which may be difficult to control for quality and accuracy.

  • Limited flexibility: ML models have limited flexibility due to the heavy dependence on training data. The model's performance may suffer when contextual images are added to the diagnostic process.

  • Overfitting: Overfitting occurs when an ML model is too complex and captures patterns that may not generalize, leading to inaccurate predictions on unseen data. It can lead to erroneous diagnoses when ML models are trained and tested on limited datasets.

  • Lack of Interpretability: Because ML models aren't easy to understand, it's hard to know why a particular prediction was made. It makes it hard to trust the results and could raise ethical concerns.

  • Computational cost: Training an ML model is computationally expensive, requiring significant computing power and time depending on the model's complexity and the dataset used for training. These costs can be too high for systems that cannot afford or do not have access to the resources needed to train these models.

  • False positives or negatives: ML models can lead to false-negative results, meaning they can incorrectly identify a healthy person as having lung disease. In the case of a false positive, a patient with lung disease is considered a healthy individual. It could happen because of imperfect training data that does not accurately reflect the behavior of the disease or due to misclassification in the dataset being used.

  • Unreliable model performance metrics: Due to the complexity and variability of features, it is hard to accurately assess or measure how well an ML model works when diagnosing a disease.

Observed concerns about imaging modalities

The researchers investigated a variety of imaging modalities; Table 12 provides an overview of the various imaging modalities examined. Table 12 makes it relatively easy to comprehend that X-rays and CT scans have surpassed all other imaging methods like PET, MRI, and other imaging modalities. The diagnosis of prominent lung ailments through primary imaging modalities is as presented:

Table 12 Machine learning and sub-fields

Pneumonia

Pneumonia can be detected through various imaging modalities, including X-ray, CT, PET, and MRI. X-rays can detect the presence of pneumonia by looking for areas of increased density in the lungs.

These areas are caused by fluid or inflammation and can be seen as white patches on the X-ray. X-rays are the most commonly used imaging modality for diagnosing pneumonia. CT scans provide a more detailed view of the lungs than X-rays and can detect subtler signs of pneumonia, such as small pockets of fluid or inflammation. PET can be used to detect the presence of pneumonia. It works by injecting a radioactive tracer into the body and scanning it with a special camera. The images produced can help doctors identify areas of inflammation and fluid accumulation in the lungs, which are common pneumonia symptoms. PET scans are beneficial for diagnosing complicated cases where other imaging techniques, such as X-rays or CT scans, may be inconclusive. PET scans can also help to differentiate between bacterial and viral forms of pneumonia. MRI is used less often to detect pneumonia, but it can provide a detailed image of the lungs and other organs in the chest.

Lung cancer

Lung cancer can be detected using X-ray images. An X-ray can reveal abnormal masses or nodules that may indicate a tumor or other abnormality. Further testing, such as a CT scan, may be preferred to confirm the diagnosis if an anomaly is found. CT scans are the most commonly used imaging modality. They can provide detailed images of the lungs, which can be used to identify tumors due to their ability to detect large and small nodules, enlarged lymph nodes, and other suspicious areas. PET scans are also used to detect cancer by detecting changes in cellular metabolism that occur with certain cancers. PET scans are often used along with CT scans to provide more detailed information about a tumor's size, shape, and location. MRI is often used to assess cancer's spread, or metastasis, from its primary site.

COVID-19

COVID-19 detection can be done using X-rays, CT scans, and MRI scans. X-ray is the most commonly used imaging modality for COVID-19 detection as it provides good image quality to detect pneumonia, one of the most common symptoms associated with COVID-19. CT scans provide more detailed images of the lungs than X-rays and can help detect other lung abnormalities associated with COVID-19, such as ground glass opacities or consolidations. It is also possible to see COVID-19 using PET-CT images. PET-CT images can show areas of increased metabolic activity that could indicate an infection. MRI scans are not commonly used for COVID-19 detection because they produce lower-resolution images than CT scans. Conclusively, a chest X-ray is the easily accessible and most common imaging modality used to diagnose lung diseases. A CT scan can provide more detailed images of the lungs than a chest X-ray and help identify subtler signs, such as small areas of infection or inflammation. They are the ones that researchers prefer to employ while doing research.

Observed concerns about datasets

Image datasets are necessary for the development of computer vision and ML models. They provide a source of input data to train, validate, and test an ML model. Access to large datasets is necessary to develop ML models that accurately identify lung disease in images. Image datasets are the backbone of any ML model and play a significant role in its success. In addition, publicly accessible image datasets provide insights, helping researchers develop automated ML models. An overview of the numerous imaging datasets on lung diseases is presented in Table 13.

Table 13 Numerous imaging datasets explored relevant to prominent lung diseases

The imaging datasets employed by researchers in their investigations were maximally proposed or constructed, and they were sometimes given names such as COVIDX [168], COVID-R [175], and COVQU [176]. Researchers also utilized and prioritized publicly available datasets, such as LIDC/IDRI [154,155,156], JSRT [157], NLST [161], and several others, in their research.

It demonstrates conclusively that X-rays and CT scans outperform other imaging datasets. It has also been discovered that in the detection of pneumonia, X-ray datasets are preferred most of the time; in the detection of lung cancer, CT scan datasets are primarily selected; and in the detection of COVID-19, X-ray datasets are preferred first, followed by CT scan datasets.

Observed concerns about ML

Table 14 shows that standard ML, DL, CNN, transfer learning, and ensemble learning algorithms can definitively evaluate lung imaging modalities such as X-ray, CT scan, MRI, and infrared thermal imaging to detect pneumonia, lung cancer, and COVID-19. When diagnosing pneumonia, as laid out in Table 14, it is simple to observe that the automatic detection and classification of pneumonia in chest X-rays are primarily accurate and attainable with DL-based approaches such as CNNs. Compared to traditional ML procedures, this one is more reliable and gives a faster and more precise diagnosis. The diagnosis also relies on transfer learning to be reliable. In combination with CNN, transfer learning and ensemble learning also support the analysis of X-rays. CT scans are used for diagnosis in ML and its sub-fields; however, they are less recommended than X-rays since an X-ray is adequate for diagnosing pneumonia.

Table 14 Numerous machine learning and sub-field in lung disease diagnosis

Employing CNNs to analyze CT images successfully identifies and categorizes lung nodules, which are minor growths that may signify lung cancer. CNNs can be trained on massive CT scan data to learn the features associated with various lung nodules, allowing for reliable identification and classification. CNN has been used in many studies to accurately identify lung nodules, making it a viable technique for the early identification of lung cancer. Conventional ML is preferred in tandem with CT scans as well. The necessity for qualitatively crisper imaging, provided by a CT scan, makes X-rays a less likely option than they would otherwise be. It is also observed that transfer learning and ensemble learning are less preferred in diagnosing lung cancer, which can be easily observed in Table 14.

Training a CNN on X-ray images makes identifying the COVID-19-typical pattern of pulmonary in-filtrates feasible. Multiple research studies have previously demonstrated that this method is effective, indicating that CNNs can accurately identify COVID-19. When using DL-based techniques like CNNs, X-rays come out on top as the preferred imaging method. It has been discovered that CNN is more accurate than the conventional ML approaches. Transfer learning and ensemble learning are also utilized with ML and CNN. CNN is preferable over all other ML methods when considering CT scans.

It is also observed that the introduced novel method has a greater dominance over existing ML and DL methods put forth by researchers.

ML pathway

ML methods can spot patterns in medical imaging that may indicate the presence of lung disease. Prominent lung diseases can be diagnosed using ML models, with the classification being based on the features. ML-based methods are increasingly being used to detect and diagnose significant lung diseases. Large datasets of images are used to train ML algorithms to detect lung abnormalities. The algorithm is then evaluated on new images, where it can recognize and categorize various forms of lung irregularities. In particular, DL models based on CNNs have been developed and employed for detecting various lung abnormalities through medical imaging.

The solution to all the issues included an explanation and observations made throughout the review. It is observed that most of the research follows the pathway of ML:

  • Image Acquisition: Researchers amassed vast and varied images from chest X-rays, CT scans, and other imaging modalities associated with certain lung diseases [6,7,8,9]. These images have been labeled chiefly for identification purposes, mostly. Most researchers preferred publicly accessible datasets in comparison to private datasets [42,43,44,45,46,47,48,49,50,51,52,53,54,55, 63, 137, 138].

  • Image Preprocessing: Researchers preprocessed the image dataset to reduce noise and outliers and normalize the data for superior results. Significant preprocessing operations had been carried out, such as the selection and modification of attributes, the imputing of missing values, the normalization of features, and the elimination of noise. The images are preprocessed to reduce their dimensionality. They converted images into numerical data by breaking them into individual pixel colors to input them into the ML model. Once the preprocessing is completed, the dataset is generally split into training and test datasets so that each portion adequately represents relevant cases [19, 140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167].

  • Feature Extraction and Relevant Feature Selection: Researchers extracted image features, such as edges, shapes, and textures, and selected relevant features so that ML algorithms could assess them [151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181].

  • Training of the ML Model: Researchers trained the ML model using labeled datasets with known outcomes to detect patterns associated with the specified disease class in supervised learning. In the case of unsupervised learning, the ML model can also draw a pattern and identify the disease with the unlabeled data. They chose an appropriate model and algorithm to learn from the input dataset. With CNN, they trained the model on processed data with different learning rates and weights or different architectures to find the best performance [121,122,123,124,125, 128].

  • Performance Metrics: Researchers evaluated the ML model using a particular performance metric. Evaluate by measuring performance metrics on how well it learned from the training data. After training the model, it is evaluated using metrics such as accuracy, recall, precision, F1 score, etc., which measure how well it performs on unseen data samples. In DL and CNN, monitoring accuracy and other metrics such as sensitivity and specificity is performed after each training epoch to ensure all parameters are fine-tuned and that training ends with an acceptable performance score that has attained desirable precision and recall scores [140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181].

  • Evaluation: The ML model was applied to fresh datasets by the researchers so that they could make predictions about the results of their research studies or identify cases of lung disease [140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181].

Observed concerns about performance metric

Researchers chose the accuracy performance metric as the primary metric because it was more important than the other metrics used to evaluate the model. Because of this, this review focused on this metric and gave an overview of it for each prominent lung disease. Accuracy is the most notable performance metric since it measures performance consistently across all classes. Since all misclassified samples are assigned the same value, accuracy can better detect slight performance discrepancies.

Analysis of performance metrics for pneumonia diagnosis

When it came to the diagnosis of pneumonia, most of the researchers calculated several types of performance metrics; nonetheless, accuracy was the metric most highly esteemed and presented in Table 15. One solitary study [141] did not achieve this since the researcher's work was not executed as desired, but all other investigations did.

Table 15 Accuracy of pneumonia diagnosis research

Analysis of performance metrics for lung cancer diagnosis

In lung cancer diagnosis, most researchers computed different kinds of performance metrics, but accuracy was the most preferred metric, as presented in Table 16. The investigations [161] and [166] were the only ones that did not favor this since other metrics required more relevance than accuracy.

Table 16 Accuracy of lung cancer diagnosis research

Analysis of performance metrics for COVID-19 diagnosis

As we observed in the trend analysis of COVID-19, in which we analyzed the meteoric increase of searches for COVID-19, the spontaneous growth of research conducted on COVID-19 is tremendous. It's something that we noticed in our investigation of the trend of COVID-19. The investigators in the COVID-19 study generally prioritized accuracy as a critical performance criterion, except for [179]. Table 17 presents the accuracy of COVID-19 diagnosis research.

Table 17 Accuracy of COVID-19 diagnosis research

Conclusion

The investigation highlights the intricacy of identifying prevalent pulmonary conditions, including COVID-19, pneumonia, and lung cancer, emphasizing the critical importance of advanced ML and imaging diagnostic techniques. The imaging datasets made available to the public underscored the significance of segregating data according to disease specifications because each prominent lung disease has symptoms that specific imaging modalities can detect because of their unique properties. The research demonstrates the inclination towards X-rays as the prevailing imaging modality, owing to their widespread availability and usage. CT scans are considered a secondary option, offering improved detail. ML techniques, particularly CNNs, transfer learning, and ensemble learning, have been crucial in speeding up and enhancing the accuracy of diagnoses. These approaches use computed imaging parameters to classify data automatically. The research contributes substantially by examining significant lung disorders, analyzing relevant datasets, and thoroughly evaluating ML methods. It also highlights the difficulties involved and suggests some solutions. The methodical exploration focuses on methodologies used in published results and provides significant perspectives for researchers in this field. Although the observations contribute significantly, it is crucial to recognize critical limitations. The use of publically available datasets may have biases, and the ability of ML models to apply to various populations has to be further investigated. The research focuses on specific imaging techniques and does not incorporate upcoming technology. Furthermore, it is crucial to focus on the comprehensibility of ML models when applied to clinical decision-making. To further advance the study, Investigating the incorporation of multi-modal datasets and real-time ML applications in healthcare environments might be advantageous. Furthermore, alternate imaging techniques, as opposed to the ones now being investigated, might enhance the comprehensiveness. Moreover, adopting ML-based diagnostic tools might facilitate the appropriate use of these technologies in the healthcare sector.

Availability of data and materials

The images, data, and datasets presented and analyzed during the review are available in the publicly available repositories and require no permissions. These images, data sets, and datasets came from public domain sources and were adequately cited and referenced in the manuscript.

Abbreviations

AFB:

Acid-Fast Bacteria

ANN:

Artificial Neural Network

ARDs:

Acute Respiratory Distress Syndrome

AI:

Artificial Intelligence

AP:

Anterior-Posterior

CDR:

Chronic Respiratory Disease

CART:

Classification and Regression Tree

CT:

Computer Tomography

CLAHE:

Contrast Limited Adaptive Histogram Equalization

CNN:

Convolutional Neural Network

CXR:

Chest X-Ray

DL:

Deep Learning

DNN:

Deep Neural Network

EIT:

Electrical Impedance Tomography

EU:

European Union

FDG:

Fluorodeoxyglucose

GAN:

Generative Adversarial Network

GDPR:

General Data Protection Regulation

GA:

Genetic Algorithm

GUI:

Graphical User Interface

GLCM:

Gray Level Co-Occurrence Matrices

GWO:

Gray Wolf Optimization

ICFA:

Improvised Cuttlefish Algorithm

ICSA:

Improvised Crow Search Algorithm

IGWA:

Improvised Grey Wolf Algorithm

ILD:

Interstitial Lung Disease

LDA:

Linear Discriminant Analysis

LBP:

Local Binary Pattern

LUS:

Lung Ultrasonography

ML:

Machine Learning

MARS:

Multivariate Adaptive Regression Splines

MRI:

Magnetic Resonance Imaging

MLP:

Multilayer Perceptron

NSCLC:

Non-Small-Cell Lung Cancer

PET:

Positron Emission Tomography

PRISMA:

Preferred Reporting Items for Systematic Reviews and Meta-Analyses

PCA:

Principal Component Analysis

RT-PCR:

Real-Time Reverse Transcriptase Polymerase Chain Reaction

RELU:

Rectified Linear Unit

RNN:

Recurrent Neural Network

ROI:

Regions of Interest

SCLC:

Small-Cell Lung Cancer

SPECT:

Single Photon Emission Computed Tomography

SSMI:

Sputum Smear Microscopy Image

SVM:

Support Vector Machine

WHO:

World Health Organization

References

  1. The top 10 causes of death. 2020. Available from: https://www.who.int/news-room/fact-sheets/detail/the-top-10-causes-of-death.

  2. Pietrangelo A. The Top 10 Deadliest Diseases. Healthline. 2023. Available from: https://www.healthline.com/health/top-10-deadliest-diseases.

  3. Chronic respiratory diseases. 2019. Available from: https://www.who.int/health-topics/chronic-respiratory-diseases#tab=tab_1.

  4. WHO Coronavirus (COVID-19) Dashboard. WHO Coronavirus (COVID-19) Dashboard with Vaccination Data. Available from: https://covid19.who.int/?mapFilter=deaths.

  5. Sharma P, Nayak DR, Balabantaray BK, Tanveer M, Nayak R. A survey on cancer detection via convolutional neural networks: current challenges and future directions. Neural Networks Elsevier BV. 2024;169:637–59. https://doi.org/10.1016/j.neunet.2023.11.006.

    Article  Google Scholar 

  6. Domingues I, Pereira G, Martins P, Duarte H, Santos J, Abreu PH. Using deep learning techniques in medical imaging: a systematic review of applications on CT and PET. Artificial intelligence review. Springer Science and Business Media LLC. 2019;53(6):4093–160.

    Google Scholar 

  7. Batouty NM, Saleh GA, Sharafeldeen A, Kandil H, Mahmoud A, Shalaby A, et al. State of the art: lung cancer staging using updated imaging modalities. Bioengineering. 2022;9(10):493. https://doi.org/10.3390/bioengineering9100493. MDPI AG.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Pearlman SI, Tang EM, Tao YK, Haselton FR. Controlling droplet marangoni flows to improve microscopy-based TB diagnosis. Diagnostics. 2021;11(11):2155. https://doi.org/10.3390/diagnostics11112155. MDPI AG.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Dimastromatteo J, Charles EJ, Laubach VE. Molecular imaging of pulmonary diseases. Respiratory Research. Springer Science and Business Media LLC; 2018;19(1). https://doi.org/10.1186/s12931-018-0716-0.

  10. Liu D, Fox K, Weber G, Miller T. Confederated learning in healthcare: Training machine learning models using disconnected data separated by individual, data type and identity for Large-Scale health system Intelligence. J Biomed Inform. 2022;134:104151. https://doi.org/10.1016/j.jbi.2022.104151. Elsevier BV.

    Article  PubMed  Google Scholar 

  11. Chowdhury D, Das A, Dey A, Banerjee S, Golec M, Kollias D, et al. CoviDetector: a transfer learning-based semi supervised approach to detect Covid-19 using CXR images. BenchCouncil Transactions on Benchmarks, Standards and Evaluations. 2023;3(2):100119. https://doi.org/10.1016/j.tbench.2023.100119. Elsevier BV.

    Article  Google Scholar 

  12. Medeiros EP, Machado MR, de Freitas EDG, da Silva DS, de Souza RWR. Applications of machine learning algorithms to support COVID-19 diagnosis using X-rays data information. Expe Syst Appl. 2024;238:122029. https://doi.org/10.1016/j.eswa.2023.122029. Elsevier BV.

    Article  Google Scholar 

  13. Alapat DJ. A Review on Detection of Pneumonia in Chest X-ray Images Using Neural Networks. J Biomed Phys Eng. Salvia Medical Sciences Ltd; 2022;12(6). https://doi.org/10.31661/jbpe.v0i0.2202-1461.

  14. Stokes K, Castaldo R, Federici C, Pagliara S, Maccaro A, Cappuccio F, et al. The use of artificial intelligence systems in diagnosis of pneumonia via signs and symptoms: a systematic review. Biomed Signal Process Control. 2022;72:103325. https://doi.org/10.1016/j.bspc.2021.103325. Elsevier BV.

    Article  Google Scholar 

  15. Althenayan AS, AlSalamah SA, Aly S, Nouh T, Mirza AA. Detection and classification of COVID-19 by radiological imaging modalities using deep learning techniques: a literature review. Appl Sci. 2022;12(20):10535. https://doi.org/10.3390/app122010535. MDPI AG.

    Article  CAS  Google Scholar 

  16. Khanna VV, Chadaga K, Sampathila N, Prabhu S, Chadaga R, Umakanth S. Diagnosing COVID-19 using artificial intelligence: a comprehensive review. Network Modeling Analysis in Health Informatics and Bioinformatics. Springer Science and Business Media LLC. 2022;11(1). https://doi.org/10.1007/s13721-022-00367-1.

  17. Panday A, Kabir MA, Chowdhury NK. A survey of machine learning techniques for detecting and diagnosing COVID‐19 from imaging. Quantitative Biology. Wiley. 2022;10(2):188–207. https://doi.org/10.15302/j-qb-021-0274.

  18. Alsaaidah B, Al-Hadidi MR, Al-Nsour H, Masadeh R, AlZubi N. Comprehensive Survey of Machine Learning Systems for COVID-19 Detection. Journal of Imaging. MDPI AG. 2022;8(10):267. https://doi.org/10.3390/jimaging8100267.

  19. Aggarwal P, Mishra NK, Fatimah B, Singh P, Gupta A, Joshi SD. COVID-19 image classification using deep learning: Advances, challenges and opportunities. Computers in Biology and Medicine. Elsevier BV. 2022;144:105350. https://doi.org/10.1016/j.compbiomed.2022.105350.

  20. Lee JH, Hwang EJ, Kim H, Park CM. A narrative review of deep learning applications in lung cancer research: from screening to prognostication. Translational Lung Cancer Research. AME Publishing Company. 2022;11(6):1217–29. Available from: https://doi.org/10.21037/tlcr-21-1012.

  21. Tomassini S, Falcionelli N, Sernani P, Burattini L, Dragoni AF. Lung nodule diagnosis and cancer histology classification from computed tomography data by convolutional neural networks: a survey. Computers in Biology and Medicine. Elsevier BV. 2022;146:105691. https://doi.org/10.1016/j.compbiomed.2022.105691.

  22. De Margerie-Mellon C, Chassagnon G. Artificial intelligence: a critical review of applications for lung nodule and lung cancer. Diagnostic and Interventional Imaging. Elsevier BV. 2023;104(1):11–7. https://doi.org/10.1016/j.diii.2022.11.007.

  23. ScienceDirect.com | Science, health and medical journals, full text articles and books. Available from: https://www.sciencedirect.com.

  24. arXiv.org e-Print archive. Available from: https://arxiv.org.

  25. IEEE Xplore. Available from: https://ieeexplore.ieee.org/Xplore/guesthome.jsp.

  26. MDPI - Publisher of Open Access Journals. Available from: https://www.mdpi.com.

  27. Normal Lung Function » Pediatric Pulmonary Division » College of Medicine » University of Florida. Available from: https://pulmonary.pediatrics.med.ufl.edu/centers-programs/asthma-program/normal-lung-function.

  28. Hoffman M. Lung Diseases Overview. WebMD. 2023. Available from: https://www.webmd.com/lung/lung-diseases-overview.

  29. Torres A, Cilloniz C, Niederman MS, Menéndez R, Chalmers JD, Wunderink RG, et al. Pneumonia. Nature Reviews Disease Primers. Springer Science and Business Media LLC. 2021;7(1). https://doi.org/10.1038/s41572-021-00259-0.

  30. Quinton LJ, Walkey AJ, Mizgerd JP. Integrative Physiology of Pneumonia. Physiological Reviews. American Physiological Society. 2018;98(3):1417–64. Available from: https://doi.org/10.1152/physrev.00032.2017.

  31. Kumar S, Awasthi V, Yadav AP, Tripathi S, Chhabra P. An Analytical Comparison of the Identification of Non-Small Cell Lung Cancer Nodules Using CT Scans and Prominent Deep Learning Models. Artificial Intelligence and Machine Learning. Boca Raton: CRC Press. 2023;91–100. https://doi.org/10.1201/9781003388319-9.

  32. Lung Cancer Prevention. National Cancer Institute. 2023. Available from: https://www.cancer.gov/types/lung/patient/lung-prevention-pdq#4.

  33. Kumar S, Dwivedi A, Verma S, Mishra AK. An Improved Convolutional Neural Network-Based Detection Framework for COVID-19 Omicron and Delta Variants Employing CT Scans. Artificial Intelligence and Machine Learning. Boca Raton: CRC Press; 2023;125–35. https://doi.org/10.1201/9781003388319-12.

  34. Lung Disease. Google Trends. Available from: https://trends.google.com/trends/explore?date=2017-10-11%202021-12-31&q=lung%20disease.

  35. Barbosa MT, Morais-Almeida M, Sousa CS, Bousquet J. The “Big Five” Lung Diseases in CoViD-19 Pandemic – a Google Trends analysis. Pulmonology. Elsevier BV; 2021;27(1):71–2. https://doi.org/10.1016/j.pulmoe.2020.06.008.

  36. Leong P, Macdonald MI, Ko BS, Bardin PG. Coexisting chronic obstructive pulmonary disease and cardiovascular disease in clinical practice: a diagnostic and therapeutic challenge. Medical Journal of Australia. Wiley; 2019;210(9):417–23. https://doi.org/10.5694/mja2.50120.

  37. Varkey B, Maier LA. Chronic respiratory diseases. Current Opinion in Pulmonary Medicine. Ovid Technologies (Wolters Kluwer Health); 2015;1. https://doi.org/10.1097/mcp.0000000000000146.

  38. Laino ME, Ammirabile A, Posa A, Cancian P, Shalaby S, Savevski V, et al. The Applications of Artificial Intelligence in Chest Imaging of COVID-19 Patients: A Literature Review. Diagnostics. MDPI AG; 2021;11(8):1317. https://doi.org/10.3390/diagnostics11081317.

  39. MIRC CTP - MircWiki. Available from: https://mircwiki.rsna.org/index.php?title=CTP-The_RSNA_Clinical_Trial_Processor.

  40. Cheplygina V, de Bruijne M, Pluim JPW. Not-so-supervised: A survey of semi-supervised, multi-instance, and transfer learning in medical image analysis. Medical Image Analysis Elsevier BV. 2019;54:280–96. https://doi.org/10.1016/j.media.2019.03.009.

    Article  Google Scholar 

  41. Ng K-H, Rehani MM. X ray imaging goes digital. BMJ. 2006;333(7572):765–6. https://doi.org/10.1136/bmj.38977.669769.2c.

    Article  PubMed  PubMed Central  Google Scholar 

  42. Kermany D, Zhang K, Goldbaum MH. Large Dataset of Labeled Optical Coherence Tomography (OCT) and Chest X-Ray Images. 2018. Available from: http://data.mendeley.com/datasets/rscbjbr9sj/3.

  43. RSNA Pneumonia Detection Challenge | Kaggle. Available from: https://www.kaggle.com/competitions/rsna-pneumonia-detection-challenge/data.

  44. NIH Chest X-rays. Kaggle. 2018. Available from: https://www.kaggle.com/datasets/nih-chest-xrays/data.

  45. Wang X, Peng Y, Lu L, Lu Z, Bagheri M, Summers RM. ChestX-Ray8: Hospital-Scale Chest X-Ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE; 2017. Available from: https://doi.org/10.1109/cvpr.2017.369.

  46. ChestX-ray14 - V7 Open Datasets. Available from: https://www.v7labs.com/open-datasets/chestx-ray14.

  47. Sait U. Curated Dataset for COVID-19 Posterior-Anterior Chest Radiography Images (X-Rays). Mendeley Data. 2021. Available from: https://doi.org/10.17632/9xkhgts2s6.3.

  48. Balanced Augmented Covid CXR Dataset. Kaggle. 2022. Available from: https://www.kaggle.com/datasets/tr1gg3rtrash/balanced-augmented-covid-cxr-dataset.

  49. Irvin J. CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison. arXiv.org. 2019. Available from: https://doi.org/10.48550/arXiv.1901.07031.

  50. Johnson AEW, Pollard T, Berkowitz SA, Greenbaum NR, Lungren MP, Deng C-Y, et al. MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Scientific Data. 2019. Available from: https://doi.org/10.1038/s41597-019-0322-0.

  51. Kumar S. Covid19-Pneumonia-Normal Chest X-Ray Images. Mendeley Data. 2022. Available from: https://doi.org/10.17632/dvntn9yhd2.1.

  52. VinDr-CXR: An open dataset of chest X-rays with radiologist annotations v1.0.0. 2021. Available from: https://doi.org/10.13026/3akn-b287.

  53. COVID-QU-Ex Dataset. Kaggle. 2022. Available from: https://www.kaggle.com/datasets/anasmohammedtahir/covidqu.

  54. Covid19 Detection. Kaggle. 2021. Available from: https://www.kaggle.com/datasets/donjon00/covid19-detection.

  55. Chest X-ray (Covid-19 & Pneumonia). Kaggle. 2020. Available from: https://www.kaggle.com/datasets/prashant268/chest-xray-covid19-pneumonia.

  56. miniJSRT_database. http://imgcom.jsrt.or.jp/minijsrtdb/.

  57. COVID-19-NY-SBU - The Cancer Imaging Archive (TCIA). The Cancer Imaging Archive (TCIA). 2023. Available from: https://doi.org/10.7937/TCIA.BBAG-2923.

  58. MIDRC-RICORD-1C - The Cancer Imaging Archive (TCIA). The Cancer Imaging Archive (TCIA). 2023. Available from: https://doi.org/10.7937/91ah-v663.

  59. COVID-19-AR - The Cancer Imaging Archive (TCIA). The Cancer Imaging Archive (TCIA). 2023. Available from: https://doi.org/10.7937/tcia.2020.py71-5978.

  60. Shervinmin. GitHub - shervinmin/DeepCovid. GitHub. Available from: https://github.com/shervinmin/DeepCovid.

  61. Kong L, Cheng J. Based on improved deep convolutional neural network model pneumonia image classification. Damaševičius R, editor. PLOS ONE. Public Library of Science (PLoS); 2021;16(11):e0258804. https://doi.org/10.1371/journal.pone.0258804.

  62. Do Monte Alves M, Pipolo Milan E, da Silva-Rocha WP, Soares de Sena da Costa A, Araújo Maciel B, Cavalcante Vale PH, et al. Fatal pulmonary sporotrichosis caused by Sporothrix brasiliensis in Northeast Brazil. Samy AM, editor. PLOS Neglected Tropical Diseases. Public Library of Science (PLoS); 2020;14(5): e0008141. https://doi.org/10.1371/journal.pntd.0008141.

  63. Mogami R, Lopes AJ, Filho RCA, Almeida FC, Da Costa Messeder AM, Koifman ACB, et al. Chest computed tomography in COVID-19 pneumonia: a retrospective study of 155 patients at a university hospital in Rio de Janeiro,. Radiologia Brasileira. 2021. Available from: https://doi.org/10.1590/0100-3984.2020.0133.

  64. SPIE-AAPM-LUNG-CT-CHALLENGE - The Cancer Imaging Archive (TCIA). The Cancer Imaging Archive (TCIA). 2023. Available from: https://doi.org/10.7937/K9/TCIA.2015.UZLSU3FL.

  65. Images - Learn - NLST - The Cancer Data Access System. Available from: https://cdas.cancer.gov/learn/nlst/images/.

  66. NSCLC-RADIOMICS - The Cancer Imaging Archive (TCIA). The Cancer Imaging Archive (TCIA). 2023. Available from: https://doi.org/10.7937/K9/TCIA.2015.PF0M9REI.

  67. CMB-LCA - The Cancer Imaging Archive (TCIA). The Cancer Imaging Archive (TCIA). 2023. Available from: https://doi.org/10.7937/3CX3-S132.

  68. CT-VS-PET-VENTILATION-IMAGING - The Cancer Imaging Archive (TCIA). The Cancer Imaging Archive (TCIA). 2023. Available from: https://doi.org/10.7937/3ppx-7s22.

  69. LUNG-PET-CT-DX - The Cancer Imaging Archive (TCIA). The Cancer Imaging Archive (TCIA). 2023. Available from: https://doi.org/10.7937/TCIA.2020.NNC2-0461.

  70. QIN-LUNG-CT - The Cancer Imaging Archive (TCIA). The Cancer Imaging Archive (TCIA). 2023. Available from: https://doi.org/10.7937/K9/TCIA.2015.NPGZYZBZ.

  71. 4D-LUNG - The Cancer Imaging Archive (TCIA). The Cancer Imaging Archive (TCIA). 2023. Available from: https://doi.org/10.7937/K9/TCIA.2016.ELN8YGLE.

  72. RIDER-LUNG-CT - The Cancer Imaging Archive (TCIA). The Cancer Imaging Archive (TCIA). 2023. Available from: https://doi.org/10.7937/k9/tcia.2015.u1x8a5nr.

  73. RIDER-LUNG-PET-CT - The Cancer Imaging Archive (TCIA). The Cancer Imaging Archive (TCIA). 2023. Available from: https://doi.org/10.7937/K9/TCIA.2015.OFIP7TVM.

  74. CT-IMAGES-IN-COVID-19 - The Cancer Imaging Archive (TCIA). The Cancer Imaging Archive (TCIA). 2023. Available from: https://doi.org/10.7937/TCIA.2020.GQRY-NC81.

  75. MIDRC-RICORD-1A - The Cancer Imaging Archive (TCIA). The Cancer Imaging Archive (TCIA). 2023. Available from: https://doi.org/10.7937/VTW4-X588.

  76. MIDRC-RICORD-1B - The Cancer Imaging Archive (TCIA). The Cancer Imaging Archive (TCIA). 2023. Available from: https://doi.org/10.7937/31V8-4A40.

  77. Angelov P, Soares E. EXPLAINABLE-BY-DESIGN APPROACH FOR COVID-19 CLASSIFICATION VIA CT-SCAN. Cold Spring Harbor Laboratory. 2020. Available from: https://doi.org/10.1101/2020.04.24.20078584.

  78. Yang X. COVID-CT-Dataset: A CT Scan Dataset about COVID-19. arXiv.org. 2020. Available from: https://doi.org/10.48550/arXiv.2003.13865.

  79. CT scan images Covid_Pneumonia _Normal. Kaggle. 2021. https://www.kaggle.com/datasets/anaselmasry/ct-scan-images-covid-pneumonia-normal.

  80. CT Scan of chest showing one of the lung nodules. figshare. Available from: https://doi.org/10.6084/m9.figshare.16069.v1.

  81. Madani M, Behzadi MM, Nabavi S. The Role of Deep Learning in Advancing Breast Cancer Detection Using Different Imaging Modalities: A Systematic Review. Cancers. MDPI AG; 2022;14(21):5334. https://doi.org/10.3390/cancers14215334.

  82. Pulmonary arteriovenous malformation mimicking a pulmonary tumour on (18) F-fluorodeoxyglucose positron-emission tomography/computed tomography. Openi. https://openi.nlm.nih.gov/detailedresult?img=PMC4424266_AMS-11-25031-001&query=PET%20Lung&it=xg&req=4&npos=72.

  83. Jiang W, Ong F, Johnson KM, Nagle SK, Hope TA, Lustig M, et al. Motion robust high resolution 3D free‐breathing pulmonary MRI using dynamic 3D image self‐navigator. Magnetic Resonance in Medicine. Wiley; 2017;79(6):2954–67. Available from: https://doi.org/10.1002/mrm.26958.

  84. Meier-Schroers M, Homsi R, Gieseke J, Schild HH, Thomas D. Lung cancer screening with MRI: Evaluation of MRI for lung cancer screening by comparison of LDCT- and MRI-derived Lung-RADS categories in the first two screening rounds. European Radiology. Springer Science and Business Media LLC; 2018;29(2):898–905. https://doi.org/10.1007/s00330-018-5607-8.

  85. Complete regression of advanced prostate cancer for ten years: A case report and review of the literature.. openi. [cited 2023 Oct 1]. Available from: https://openi.nlm.nih.gov/detailedresult?img=PMC3789058_OL-06-02-0590-g02&query=MRI%20OF%20LUNG&it=xg&req=4&npos=64.

  86. Zachariou M, Arandjelović O, Sabiiti W, Mtafya B, Sloan D. Tuberculosis Bacteria Detection and Counting in Fluorescence Microscopy Images Using a Multi-Stage Deep Learning Pipeline. Information. MDPI AG. 2022;13(2):96. Available from: https://doi.org/10.3390/info13020096.

  87. Shah MI, Mishra S, Yadav VK, Chauhan A, Sarkar M, Sharma SK, et al. Ziehl–Neelsen sputum smear microscopy image database: a resource to facilitate automated bacilli detection for tuberculosis diagnosis. J Med Imaging. SPIE-Intl Soc Optical Eng; 2017;4(2):027503. Available from: https://doi.org/10.1117/1.jmi.4.2.027503.

  88. Delgado LG. Dataset from Remote analysis of Sputum Smears for Mycobacterium Tuberculosis Quantification using Digital Crowdsourcing. Zenodo. 2022.

  89. Ball L, Vercesi V, Costantino F, Chandrapatham K, Pelosi P. Lung imaging: how to get better look inside the lung. Ann Transl Med. AME Publishing Company; 2017;5(14):294–294. Available from: https://doi.org/10.21037/atm.2017.07.20.

  90. Singh M, Pujar GV, Kumar SA, Bhagyalalitha M, Akshatha HS, Abuhaija B, et al. Evolution of machine learning in tuberculosis diagnosis: a review of deep learning-based medical applications. Electronics. 2022;11(17):2634. https://doi.org/10.3390/electronics11172634. MDPI AG.

    Article  Google Scholar 

  91. Takhkik Y, Susila IP, Supriyono. An identification of pneumonia diseases using supervised learning on digital X-ray image. Proceedings of International Conference on Nuclear Science, Technology, and Application 2020 (ICONSTA 2020). AIP Publishing; 2021; Available from: https://doi.org/10.1063/5.0067605.

  92. Wang S, Liu Z, Chen X, Zhu Y, Zhou H, Tang Z, et al. Unsupervised Deep Learning Features for Lung Cancer Overall Survival Analysis. 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). IEEE; 2018; Available from: https://doi.org/10.1109/embc.2018.8512833.

  93. Meier NR, Sutter TM, Jacobsen M, Ottenhoff THM, Vogt JE, Ritz N. Machine Learning Algorithms Evaluate Immune Response to Novel Mycobacterium Tuberculosis Antigens for Diagnosis of Tuberculosis. Front Cell Infect Microbiol. Frontiers Media SA; 2021;10. Available from: https://doi.org/10.3389/fcimb.2020.5940300.

  94. Kim TK, Yi PH, Hager GD, Lin CT. Refining dataset curation methods for deep learning-based automated tuberculosis screening. Journal of Thoracic Disease. AME Publishing Company; 2020;12(9):5078–85. Available from: https://doi.org/10.21037/jtd.2019.08.34.

  95. Machine Learning. Google Trends. Google. [cited 2023 Oct 1]. Available from: https://trends.google.com/trends/explore?date=2012-01-01%202022-12-31&q=Machine%20Learning.

  96. Kieu STH, Bade A, Hijazi MHA, Kolivand H. A survey of deep learning for lung disease detection on medical images: state-of-the-art, taxonomy, issues and future directions. J Imaging. MDPI AG. 2020;6(12):131. Available from: https://doi.org/10.3390/jimaging6120131.

  97. Li X, Wang Y, Cai Y. Automatic annotation algorithm of medical radiological images using convolutional neural network. Pattern Recognition Letters. Elsevier BV; 2021; 152:158–65. Available from: https://doi.org/10.1016/j.patrec.2021.09.011.

  98. Dimaridis I, Sridharan P, Ntziachristos V, Karlas A, Hadjileontiadis L. Image quality improvement techniques and assessment adequacy in clinical optoacoustic imaging: a systematic review. Biosensors. MDPI AG. 2022;12(10):901. Available from: https://doi.org/10.3390/bios12100901.

  99. Vajda S, Karargyris A, Jaeger S, Santosh KC, Candemir S, Xue Z, et al. Feature Selection for Automatic Tuberculosis Screening in Frontal Chest Radiographs. Journal of Medical Systems. Springer Science and Business Media LLC; 2018;42(8). Available from: https://doi.org/10.1007/s10916-018-0991-9.

  100. Shi F, Xia L, Shan F, Song B, Wu D, Wei Y, et al. Large-scale screening to distinguish between COVID-19 and community-acquired pneumonia using infection size-aware classification. Phys Med Biol. IOP Publishing; 2021;66(6):065031. Available from: https://doi.org/10.1088/1361-6560/abe838.

  101. Downey A, Downey A. Thousands of NHS medical images found. Digital Health. 2019. Available from: https://www.digitalhealth.net/2019/09/thousands-nhs-medical-images-unprotected-web.

  102. Data Protection Act 2018. Available from: https://www.legislation.gov.uk/ukpga/2018/12/contents/enacted.

  103. Ayshath Thabsheera AP, Thasleema TM, Rajesh R. Lung cancer detection using CT scan images: a review on various image processing techniques. Data analytics and learning. Singapore: Springer Singapore; 2018;413–9. Available from: https://doi.org/10.1007/978-981-13-2514-4_34.

  104. Yusuf M, Atal I, Li J, Smith P, Ravaud P, Fergie M, et al. Reporting quality of studies using machine learning models for medical diagnosis: a systematic review. BMJ Open. 2020;10(3):e034568. https://doi.org/10.1136/bmjopen-2019-034568.

    Article  PubMed  PubMed Central  Google Scholar 

  105. Sarker IH. Machine Learning: Algorithms, Real-World Applications and Research Directions. SN Computer Science. Springer Science and Business Media LLC; 2021;2(3). Available from: https://doi.org/10.1007/s42979-021-00592-x.

  106. Jamshidi M, Lalbakhsh A, Talla J, Peroutka Z, Hadjilooei F, Lalbakhsh P, et al. Artificial Intelligence and COVID-19: Deep Learning Approaches for Diagnosis and Treatment. IEEE Access. Institute of Electrical and Electronics Engineers (IEEE); 2020;8:109581–95. https://doi.org/10.1109/access.2020.3001973.

  107. Kumar S, Kumar H. Lungcov: A diagnostic framework using machine learning and Imaging Modality. International Journal on “Technical and Physical Problems of Engineering” (IJTPE). 2022 Jun;14(51):190–9. http://mail.iotpe.com/IJTPE/IJTPE-2022/IJTPE-Issue51-Vol14-No2-Jun2022/23-IJTPE-Issue51-Vol14-No2-Jun2022-pp190-199.pdf.

  108. Rajaraman S, Candemir S, Xue Z, Alderson PO, Kohli M, Abuya J, et al. A novel stacked generalization of models for improved TB detection in chest radiographs. 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). IEEE; 2018 Jul; Available from: https://doi.org/10.1109/embc.2018.8512337.

  109. Kumar S, Kumar H. Lung Cancer Diagnosis Using X-Ray and CT Scan Images Based on Machine Learning Approaches. Proceedings of Fourth International Conference on Computing, Communications, and Cyber-Security. Singapore: Springer Nature Singapore; 2023;399–412. https://doi.org/10.1007/978-981-99-1479-1_30.

  110. Tesař L, Shimizu A, Smutek D, Kobatake H, Nawano S. Medical image analysis of 3D CT images based on extension of Haralick texture features. Computerized Medical Imaging and Graphics. Elsevier BV; 2008;32(6):513–20. Available from: https://doi.org/10.1016/j.compmedimag.2008.05.005.

  111. Gupta N, Gupta D, Khanna A, Rebouças Filho PP, de Albuquerque VHC. Evolutionary algorithms for automatic lung disease detection. Measurement Elsevier BV. 2019;140:590–608. https://doi.org/10.1016/j.measurement.2019.02.042.

    Article  Google Scholar 

  112. Mafanya M, Tsele P, Zengeya T, Ramoelo A. An assessment of image classifiers for generating machine-learning training samples for mapping the invasive Campuloclinium macrocephalum (Less.) DC (pompom weed) using DESIS hyperspectral imagery. ISPRS Journal of Photogrammetry and Remote Sensing. Elsevier BV; 2022;185:188–200. https://doi.org/10.1016/j.isprsjprs.2022.01.015.

  113. Sarker IH, Kayes ASM, Watters P. Effectiveness analysis of machine learning classification models for predicting personalized context-aware smartphone usage. Journal of Big Data. Springer Science and Business Media LLC; 2019;6(1). Available from: https://doi.org/10.1186/s40537-019-0219-y.

  114. Agrawal U, Etingov P, Huang R. Advanced Performance Metrics and Their Application to the Sensitivity Analysis for Model Validation and Calibration. IEEE Transactions on Power Systems. Institute of Electrical and Electronics Engineers (IEEE); 2021;36(5):4503–12. Available from: https://doi.org/10.1109/tpwrs.2021.3066911.

  115. Qavidel Fard Z, Zomorodian ZS, Korsavi SS. Application of machine learning in thermal comfort studies: A review of methods, performance and challenges. Energy and Buildings. Elsevier BV; 2022; 256:111771. Available from: https://doi.org/10.1016/j.enbuild.2021.111771.

  116. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. Springer Science and Business Media LLC; 2015;521(7553):436–44. Available from: https://doi.org/10.1038/nature14539.

  117. Lee J-G, Jun S, Cho Y-W, Lee H, Kim GB, Seo JB, et al. Deep Learning in Medical Imaging: General Overview. Korean Journal of Radiology. The Korean Society of Radiology. 2017;18(4):570. https://doi.org/10.3348/kjr.2017.18.4.570.

  118. Kumar S, Kumar H. Classification of COVID-19 X-ray images using transfer learning with visual geometrical groups and novel sequential convolutional neural networks. MethodsX. Elsevier BV;2023;11:102295. https://doi.org/10.1016/j.mex.2023.102295.

  119. Razzak M, Naz S, Zaib A, “Deep Learning for Medical Image Processing: Overview, Challenges and the Future. Lecture Notes in Computational Vision and Biomechanics, Springer, Berlin/Heidelberg, Germany, 2017; pp. 323–350. https://arxiv.org/ftp/arxiv/papers/1704/1704.06825.pdf.

  120. Althenayan AS, AlSalamah SA, Aly S, Nouh T, Mirza AA. Detection and classification of COVID-19 by radiological imaging modalities using deep learning techniques: a literature review. Appl Sci. MDPI AG; 202;12(20):10535. https://doi.org/10.3390/app122010535.

  121. Alzubaidi L, Zhang J, Humaidi AJ, Al-Dujaili A, Duan Y, Al-Shamma O, et al. Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. Journal of Big Data. Springer Science and Business Media LLC; 2021;8(1). Available from: https://doi.org/10.1186/s40537-021-00444-8.

  122. Machine Learning and Deep Learning. Google Trends, Google. [cited 2023 Oct 2]. Available from: https://trends.google.com/trends/explore?date=2012-01-01%202022-12-31&q=Machine%20Learning,Deep%20Learning.

  123. Handbook of Medical Image Computing and Computer Assisted Intervention - 1st Edition. 2019. Available from: https://shop.elsevier.com/books/handbook-of-medical-image-computing-and-computer-assisted-intervention/zhou/978-0-12-816176-0.

  124. Kiryu S, Yasaka K, Akai H, Nakata Y, Sugomori Y, Hara S, et al. Deep learning to differentiate parkinsonian disorders separately using single midsagittal MR imaging: a proof of concept study. European Radiology. Springer Science and Business Media LLC; 2019;29(12):6891–9. Available from: https://doi.org/10.1007/s00330-019-06327-0.

  125. Toğaçar M, Ergen B, Cömert Z. Detection of lung cancer on chest CT images using minimum redundancy maximum relevance feature selection method with convolutional neural networks. Biocybernetics and Biomedical Engineering. Elsevier BV; 2020;40(1):23–39. Available from: https://doi.org/10.1016/j.bbe.2019.11.004.

  126. Han J, Moraga C. The influence of the sigmoid function parameters on the speed of backpropagation learning. Lecture Notes in Computer. Berlin, Heidelberg: Springer Berlin Heidelberg; 1995;195–201. Available from: https://doi.org/10.1007/3-540-59497-3_175.

  127. Feng J, Lu S. Performance Analysis of Various Activation Functions in Artificial Neural Networks. Journal of Physics: Conference Series. IOP Publishing; 2019;1237(2):022030. Available from: https://doi.org/10.1088/1742-6596/1237/2/022030.

  128. Basha SHS, Dubey SR, Pulabaigari V, Mukherjee S. Impact of fully connected layers on performance of convolutional neural networks for image classification. Neurocomputing. Elsevier BV; 2020;378:112–9. Available from: https://doi.org/10.1016/j.neucom.2019.10.008.

  129. Gomes R, Kamrowski C, Langlois J, Rozario P, Dircks I, Grottodden K, et al. A Comprehensive Review of Machine Learning Used to Combat COVID-19. Diagnostics. MDPI AG; 2022;12(8):1853. Available from: https://doi.org/10.3390/diagnostics12081853.

  130. Ganaie MA, Hu M, Malik AK, Tanveer M, Suganthan PN. Ensemble deep learning: a review. Engineering Applications of Artificial Intelligence. Elsevier BV; 2022;115:105151. Available from: https://doi.org/10.1016/j.engappai.2022.105151.

  131. Witten IH, Frank E, Hall MA, Pal CJ. Ensemble learning. Data Mining. Elsevier; 2017;479–501. Available from: https://doi.org/10.1016/b978-0-12-804291-5.00012-x.

  132. Nguyen D, Nguyen H, Ong H, Le H, Ha H, Duc NT, et al. Ensemble learning using traditional machine learning and deep neural network for diagnosis of Alzheimer’s disease. IBRO Neuroscience Reports. Elsevier BV; 2022;13:255–63. Available from: https://doi.org/10.1016/j.ibneur.2022.08.010.

  133. Yi Z, Wang Y. Transfer Learning on Interstitial Lung Disease Classification. 2021 International Conference on Signal Processing and Machine Learning (CONF-SPML). IEEE; 2021; Available from: https://doi.org/10.1109/conf-spml54095.2021.00046.

  134. Sahu HK, Kumar S, Alsamhi SH, Chaube MK, Curry E. Novel Framework for Alzheimer Early Diagnosis using Inductive Transfer Learning Techniques. 2022 2nd International Conference on Emerging Smart Technologies and Applications (eSmarTA). IEEE; 2022; Available from: https://doi.org/10.1109/esmarta56775.2022.9935379.

  135. Weber M, Auch M, Doblander C, Mandl P, Jacobsen H-A. Transfer Learning with Time Series Data: A Systematic Mapping Study. IEEE Access. Institute of Electrical and Electronics Engineers (IEEE); 2021;9:165409–32. Available from: https://doi.org/10.1109/access.2021.3134628.

  136. Piwowar HA, Chapman WW. Public sharing of research datasets: A pilot study of associations. Journal of Informetrics. Elsevier BV; 2010;4(2):148–56. Available from: https://doi.org/10.1016/j.joi.2009.11.010.

  137. Kermany DS, Goldbaum M, Cai W, Valentim CCS, Liang H, Baxter SL, et al. Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning. Cell. Elsevier BV; 2018;172(5):1122–1131.e9. Available from: https://doi.org/10.1016/j.cell.2018.02.010.

  138. Nguyen HQ, Lam K, Le L, Pham HH, Tran DQ, Nguyen D, et al. VinDr-CXR: An open dataset of chest X-rays with radiologist’s annotations. Scientific Data. 2022. Available from: https://doi.org/10.1038/s41597-022-01498-w.

  139. Jenjaroenpun P, Wanchai V, Ono-Moore KD, Laudadio J, James LP, Adams SH, et al. Two SARS-CoV-2 Genome Sequences of Isolates from Rural U.S. Patients Harboring the D614G Mutation, Obtained Using Nanopore Sequencing. Roux S, editor. Microbiology Resource Announcements. American Society for Microbiology; 2021;10(1). Available from: https://doi.org/10.1128/mra.01109-20.

  140. Szepesi P, Szilágyi L. Detection of pneumonia using convolutional neural networks and deep learning. Biocybernetics and Biomedical Engineering. Elsevier BV. 2022;42(3):1012–22. Available from: https://doi.org/10.1016/j.bbe.2022.08.001.

  141. Avola D, Bacciu A, Cinque L, Fagioli A, Marini MR, Taiello R. Study on transfer learning capabilities for pneumonia classification in chest-x-rays images. Comput Methods Program Biomed. Elsevier BV; 2022;221:106833. Available from: https://doi.org/10.1016/j.cmpb.2022.106833.

  142. Liu J, Qi J, Chen W, Nian Y. Multi-branch fusion auxiliary learning for the detection of pneumonia from chest X-ray images. Comput Biol Med. Elsevier BV. 2022;147:105732. Available from: https://doi.org/10.1016/j.compbiomed.2022.105732.

  143. Srivastava G, Pradhan N, Saini Y. Ensemble of Deep Neural Networks based on Condorcet’s Jury Theorem for screening Covid-19 and Pneumonia from radiograph images. Computers in Biology and Medicine. Elsevier BV. 2022;149:105979. Available from: https://doi.org/10.1016/j.compbiomed.2022.105979.

  144. Qu Y, Meng Y, Fan H, Xu RX. Low-cost thermal imaging with machine learning for non-invasive diagnosis and therapeutic monitoring of pneumonia. Infrared Physics & Technology. Elsevier BV. 2022;123:104201. Available from: https://doi.org/10.1016/j.infrared.2022.104201.

  145. Singh AK, Kumar A, Mahmud M, Kaiser MS, Kishore A. COVID-19 infection detection from chest x-ray images using hybrid social group optimization and support vector classifier. Cognitive Computation. Springer Science and Business Media LLC. 2021. Available from: https://doi.org/10.1007/s12559-021-09848-3.

  146. Chowdhury MEH, Rahman T, Khandakar A, Mazhar R, Kadir MA, Mahbub ZB, et al. Can AI Help in Screening Viral and COVID-19 Pneumonia? IEEE Access. Institute of Electrical and Electronics Engineers (IEEE); 2020; 8:132665–76. Available from: https://doi.org/10.1109/access.2020.3010287.

  147. Wong PK, Yan T, Wang H, Chan IN, Wang J, Li Y, et al. Automatic detection of multiple types of pneumonia: Open dataset and a multi-scale attention network. Biomedical Signal Processing and Control. Elsevier BV. 2022;73:103415. Available from: https://doi.org/10.1016/j.bspc.2021.103415.

  148. Ukwuoma CC, Qin Z, Belal Bin Heyat M, Akhtar F, Bamisile O, Muaad AY, et al. A hybrid explainable ensemble transformer encoder for pneumonia identification from chest X-ray images. Journal of Advanced Research. Elsevier BV; 2023;48:191–211. Available from: https://doi.org/10.1016/j.jare.2022.08.021.

  149. Kusk MW, Lysdahlgaard S. The effect of Gaussian noise on pneumonia detection on chest radiographs, using convolutional neural networks. Radiography. Elsevier BV. 2023;29(1):38–43. Available from: https://doi.org/10.1016/j.radi.2022.09.011.

  150. Li D, Li S. An artificial intelligence deep learning platform achieves high diagnostic accuracy for Covid-19 pneumonia by reading chest X-ray images. iScience. Elsevier BV. 2022;25(4):104031. Available from: https://doi.org/10.1016/j.isci.2022.104031.

  151. Bhandari M, Shahi TB, Siku B, Neupane A. Explanatory classification of CXR images into COVID-19, Pneumonia and Tuberculosis using deep learning and XAI. Comput Biol Med. Elsevier BV. 2022;150:106156. Available from: https://doi.org/10.1016/j.compbiomed.2022.106156.

  152. Moradi Khaniabadi P, Bouchareb Y, Al-Dhuhli H, Shiri I, Al-Kindi F, Moradi Khaniabadi B, et al. Two-step machine learning to diagnose and predict involvement of lungs in COVID-19 and pneumonia using CT radiomics. Comput Biol Med. Elsevier BV. 2022; 150:106165. Available from: https://doi.org/10.1016/j.compbiomed.2022.106165.

  153. Ascencio-Cabral A, Reyes-Aldasoro CC. Comparison of convolutional neural networks and transformers for the classification of images of COVID-19, pneumonia and healthy individuals as observed with computed tomography. J Imaging. MDPI AG; 2022;8(9):237. Available from: https://doi.org/10.3390/jimaging8090237.

  154. Sekeroglu K, Soysal ÖM. Multi-perspective hierarchical deep-fusion learning framework for lung nodule classification. Sensors. MDPI AG. 2022;22(22):8949. Available from: https://doi.org/10.3390/s22228949.

  155. Donga HV, Karlapati JSAN, Desineedi HSS, Periasamy P, TR S. Effective framework for pulmonary nodule classification from CT images using the modified gradient boosting method. Appl Sci. MDPI AG; 2022;12(16):8264. Available from: https://doi.org/10.3390/app12168264.

  156. Khehrah N, Farid MS, Bilal S, Khan MH. Lung nodule detection in CT images using statistical and shape-based features. Journal of Imaging. MDPI AG. 2020;6(2):6. Available from: https://doi.org/10.3390/jimaging6020006.

  157. Ausawalaithong W, Thirach A, Marukatat S, Wilaiprasitporn T. Automatic Lung Cancer Prediction from Chest X-ray Images Using the Deep Learning Approach. 2018 11th Biomedical Engineering International Conference (BMEiCON). IEEE; 2018. Available from: https://doi.org/10.1109/bmeicon.2018.8609997.

  158. Chen X, Duan Q, Wu R, Yang Z. Segmentation of lung computed tomography images based on SegNet in the diagnosis of lung cancer. Journal of Radiation Research and Applied Sciences. Elsevier BV. 2021;14(1):396–403. Available from: https://doi.org/10.1080/16878507.2021.1981753.

  159. Nanglia P, Kumar S, Mahajan AN, Singh P, Rathee D. A hybrid algorithm for lung cancer classification using SVM and neural networks. ICT Express. Elsevier BV. 2021;7(3):335–41. Available from: https://doi.org/10.1016/j.icte.2020.06.007.

  160. Alshmrani GMM, Ni Q, Jiang R, Pervaiz H, Elshennawy NM. A deep learning architecture for multi-class lung diseases classification using chest X-ray (CXR) images. Alexandria Eng J. Elsevier BV. 2023;64:923–35. Available from: https://doi.org/10.1016/j.aej.2022.10.053.

  161. Heuvelmans MA, van Ooijen PMA, Ather S, Silva CF, Han D, Heussel CP, et al. Lung cancer prediction by Deep Learning to identify benign lung nodules. Lung Cancer. Elsevier BV. 2021;154:1–4. Available from: https://doi.org/10.1016/j.lungcan.2021.01.027.

  162. Rahouma KH, Mabrouk SM, Aouf M. Lung Cancer Diagnosis Based on Chan-Vese Active Contour and Polynomial Neural Network. Procedia Computer Science. Elsevier BV. 2021;194:22–31. Available from: https://doi.org/10.1016/j.procs.2021.10.056.

  163. Bilal A, Shafiq M, Fang F, Waqar M, Ullah I, Ghadi YY, et al. IGWO-IVNet3: DL-Based Automatic Diagnosis of Lung Nodules Using an Improved Gray Wolf Optimization and InceptionNet-V3. Sensors. MDPI AG. 2022;22(24):9603. Available from: https://doi.org/10.3390/s22249603.

  164. Torres G, Baeza S, Sanchez C, Guasch I, Rosell A, Gil D. An intelligent radiomic approach for lung cancer screening. Appl Sci. MDPI AG. 202231;12(3):1568. Available from: https://doi.org/10.3390/app12031568.

  165. Hussain L, Alsolai H, Hassine SBH, Nour MK, Duhayyim MA, Hilal AM, et al. Lung cancer prediction using robust machine learning and image enhancement methods on extracted gray-level co-occurrence matrix features. Appl Sci. MDPI AG. 2022;12(13):6517. Available from: https://doi.org/10.3390/app12136517.

  166. Kuo C-FJ, Huang C-C, Siao J-J, Hsieh C-W, Huy VQ, Ko K-H, et al. Automatic lung nodule detection system using image processing techniques in computed tomography. Biomed Signal Process Control. Elsevier BV. 2020;56:101659. Available from: https://doi.org/10.1016/j.bspc.2019.101659.

  167. Singh GAP, Gupta PK. Performance analysis of various machine learning-based approaches for detection and classification of lung cancer in humans. Neural Computing and Applications. Springer Science and Business Media LLC; 2018;31(10):6863–77. Available from: https://doi.org/10.1007/s00521-018-3518-x.

  168. Wang L, Lin ZQ, Wong A. COVID-Net: a tailored deep convolutional neural network design for detection of COVID-19 cases from chest X-ray images. Sci Rep. Springer Science and Business Media LLC; 2020;10(1). Available from: https://doi.org/10.1038/s41598-020-76550-z.

  169. Keles A, Keles MB, Keles A. COV19-CNNet and COV19-ResNet: diagnostic inference engines for early detection of COVID-19. Cognitive Computation. Springer Science and Business Media LLC. 2021. Available from: https://doi.org/10.1007/s12559-020-09795-5.

  170. Ohata EF, Bezerra GM, Chagas JVS das, Lira Neto AV, Albuquerque AB, Albuquerque VHC de, et al. Automatic detection of COVID-19 infection using chest X-ray images through transfer learning. IEEE/CAA Journal of Automatica Sinica. Institute of Electrical and Electronics Engineers (IEEE); 2021;8(1):239–48. Available from: https://doi.org/10.1109/jas.2020.1003393.

  171. Singh RK, Pandey R, Babu RN. COVIDScreen: explainable deep learning framework for differential diagnosis of COVID-19 using chest X-rays. Neural Computing and Applications. Springer Science and Business Media LLC.2021;33(14):8871–92. Available from: https://doi.org/10.1007/s00521-020-05636-6.

  172. Khan AI, Shah JL, Bhat MM. CoroNet: A deep neural network for detection and diagnosis of COVID-19 from chest x-ray images. Computer Methods and Programs in Biomedicine. Elsevier BV. 2020;196:105581. Available from: https://doi.org/10.1016/j.cmpb.2020.105581.

  173. Madaan V, Roy A, Gupta C, Agrawal P, Sharma A, Bologa C, et al. XCOVNet: Chest X-ray image classification for COVID-19 early detection using convolutional neural networks. New Generation Computing. Springer Science and Business Media LLC; 2021;39(3–4):583–97. Available from: https://doi.org/10.1007/s00354-021-00121-7.

  174. Das AK, Ghosh S, Thunder S, Dutta R, Agarwal S, Chakrabarti A. Automatic COVID-19 detection from X-ray images using ensemble learning with convolutional neural network. Pattern Analysis and Applications. Springer Science and Business Media LLC. 2021;24(3):1111–24. Available from: https://doi.org/10.1007/s10044-021-00970-4.

  175. Hussain E, Hasan M, Rahman MA, Lee I, Tamanna T, Parvez MZ. CoroDet: a deep learning based classification for COVID-19 detection using chest X-ray images. Chaos, Solitons & Fractals. Elsevier BV; 2021; 142:110495. Available from: https://doi.org/10.1016/j.chaos.2020.110495.

  176. Rahman T, Khandakar A, Qiblawey Y, Tahir A, Kiranyaz S, Abul Kashem SB, et al. Exploring the effect of image enhancement techniques on COVID-19 detection using chest X-ray images. Comput Biol Med. Elsevier BV. 2021;132:104319. Available from: https://doi.org/10.1016/j.compbiomed.2021.104319.

  177. Narin A, Kaya C, Pamuk Z. Automatic detection of coronavirus disease (COVID-19) using X-ray images and deep convolutional neural networks. Pattern Analysis and Applications. Springer Science and Business Media LLC. 2021;24(3):1207–20. Available from: https://doi.org/10.1007/s10044-021-00984-y.

  178. Celik G. Detection of Covid-19 and other pneumonia cases from CT and X-ray chest images using deep learning based on feature reuse residual block and depthwise dilated convolutions neural network. Appl Soft Comput. Elsevier BV. 2023;133:109906. Available from: https://doi.org/10.1016/j.asoc.2022.109906.

  179. Gozes O, Frid-Adar M, Sagie N, Kabakovitch A, Amran D, Amer R, et al. A Weakly Supervised Deep Learning Framework for COVID-19 CT Detection and Analysis. Thoracic Image Analysis. Cham: Springer International Publishing; 2020;84–93. Available from: https://doi.org/10.1007/978-3-030-62469-9_8.

  180. Ahuja S, Panigrahi BK, Dey N, Rajinikanth V, Gandhi TK. Deep transfer learning-based automated detection of COVID-19 from lung CT scan slices. Appl Intell. Springer Science and Business Media LLC; 2020;51(1):571–85. Available from: https://doi.org/10.1007/s10489-020-01826-w.

  181. Silva P, Luz E, Silva G, Moreira G, Silva R, Lucio D, et al. COVID-19 detection in CT images with deep learning: a voting-based scheme and cross-datasets analysis. Informatics in Medicine Unlocked. Elsevier BV. 2020;20:100427. Available from: https://doi.org/10.1016/j.imu.2020.100427.

Download references

Acknowledgements

Not applicable.

Funding

No external funding was associated with this research study.

Author information

Authors and Affiliations

Authors

Contributions

S. K. contributed to the conception, design, data selection, analysis and drafting. H. K. contributed to the reviewing and critical revision of the manuscript. G. K. contributed to the data extraction, and drafting. S. P. S. contributed to the literature search. A. B. contributed to the framework of the study. M. D. contributed to the data analysis. All authors gave their final approval and accepted accountability for all aspects of the work.

Corresponding author

Correspondence to Anchit Bijalwan.

Ethics declarations

Ethics approval and consent to participate

This manuscript has not been submitted to, nor is under review at, another journal or other publishing venue. We declare that this paper is original and has been read and approved by all named authors. We further confirm that all have approved the order of authors listed in the paper. The study reported in this manuscript doesn’t require any involving human participants as a subject.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kumar, S., Kumar, H., Kumar, G. et al. A methodical exploration of imaging modalities from dataset to detection through machine learning paradigms in prominent lung disease diagnosis: a review. BMC Med Imaging 24, 30 (2024). https://doi.org/10.1186/s12880-024-01192-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12880-024-01192-w

Keywords