The reporting quality of natural language processing studies: systematic review of studies of radiology reports

Davidson, Emma M.; Poon, Michael T. C.; Casey, Arlene; Grivas, Andreas; Duma, Daniel; Dong, Hang; Suárez-Paniagua, Víctor; Grover, Claire; Tobin, Richard; Whalley, Heather; Wu, Honghan; Alex, Beatrice; Whiteley, William

doi:10.1186/s12880-021-00671-8

BMC Medical Imaging

Table 2 Items used to assess the quality of reporting criteria in the current review

From: The reporting quality of natural language processing studies: systematic review of studies of radiology reports

Quality heading	Quality criteria	Definition
Data source	(1) Sampling	Reported details of the sampling strategy for radiology reports, including whether they are from consecutive patients
Data source	(2) Consistent imaging acquisition	Reported whether radiology reports were from images taken from one imaging machine or more and, if more, whether these machines were of comparable specification
Dataset criteria	(3) Dataset size	Reported their dataset size of > 200
	(4) Training dataset	Reported training data set size—the part of the initial dataset used to develop an NLP algorithm
	(5) Test dataset	Reported test data set size—part of the initial dataset used to evaluate an NLP algorithm
	(6) Validation dataset	Reported validation data set size—a separate dataset used to evaluate the performance of an NLP algorithm in a clinical setting (may be internal or external to the initial dataset)
Ground truth criteria	(7) Annotated dataset	Reported annotated data set size—data which has been marked-up by humans for ground truth
	(8) Domain expert for annotation	Reported use of a domain expert for annotation—annotation carried out by a radiologist or specialist clinician
	(9) Number of annotators	Reported the number of annotators
	(10) Inter-annotator agreement	Reported the agreement between annotators (if more than one annotator used)
Outcome criteria	(11) Precision	Reported precision (positive predictive value)
Outcome criteria	(12) Recall	Reported recall (sensitivity)
Reproducibility criteria	(13) External validation	Reported whether the NLP algorithm is tested on external data from another setting (a separate healthcare system, hospital or institution)
	(14) Availability of data	Reported whether their data set is available for use (preferably with link provided in paper)
	(15) Availability of NLP code	Reported whether their NLP code is available for use (preferably with link provided in paper)

Back to article page

ISSN: 1471-2342

Contact us

General enquiries: journalsubmissions@springernature.com