Quality heading | Quality criteria | Definition |
---|---|---|
Data source | (1) Sampling | Reported details of the sampling strategy for radiology reports, including whether they are from consecutive patients |
(2) Consistent imaging acquisition | Reported whether radiology reports were from images taken from one imaging machine or more and, if more, whether these machines were of comparable specification | |
Dataset criteria | (3) Dataset size | Reported their dataset size of > 200 |
(4) Training dataset | Reported training data set size—the part of the initial dataset used to develop an NLP algorithm | |
(5) Test dataset | Reported test data set size—part of the initial dataset used to evaluate an NLP algorithm | |
(6) Validation dataset | Reported validation data set size—a separate dataset used to evaluate the performance of an NLP algorithm in a clinical setting (may be internal or external to the initial dataset) | |
Ground truth criteria | (7) Annotated dataset | Reported annotated data set size—data which has been marked-up by humans for ground truth |
(8) Domain expert for annotation | Reported use of a domain expert for annotation—annotation carried out by a radiologist or specialist clinician | |
(9) Number of annotators | Reported the number of annotators | |
(10) Inter-annotator agreement | Reported the agreement between annotators (if more than one annotator used) | |
Outcome criteria | (11) Precision | Reported precision (positive predictive value) |
(12) Recall | Reported recall (sensitivity) | |
Reproducibility criteria | (13) External validation | Reported whether the NLP algorithm is tested on external data from another setting (a separate healthcare system, hospital or institution) |
(14) Availability of data | Reported whether their data set is available for use (preferably with link provided in paper) | |
(15) Availability of NLP code | Reported whether their NLP code is available for use (preferably with link provided in paper) |