This CMR study shows that initial teaching to complete beginners increases accuracy for assessment of left ventricular volume measurements, however with high bias and low precision even after teaching. Even though the intermediate beginner showed quite high accuracy already before teaching, precision did generally not improve after standardised teaching as offered in most basic CMR courses. Thus, a two-hour teaching session is insufficient to allow complete and intermediate beginners to adequately evaluate basic CMR studies for left ventricular volumes and function.
When comparing accuracy and precision in the current study with previous studies some important points in how to plan teaching sessions need to be considered. In the previous teaching study by Karamitsos et al.  the beginner observers had up to two years previous training and were taught how to delineate CMR images also before their two-month study training period. We showed generally lower accuracy and precision for complete beginners, as expected, but similar in parts for the intermediate beginner in the current study. In their study, however, the most basal and apical slices were excluded from analysis, which inherently may increase accuracy and precision as compared with experienced observers when excluding these more challenging slices, although at a cost of accuracy and precision vs. true volumes. Compared with the study by Groth et al. , accuracy was lower but precision similar for EDV and ESV in the current study for the complete beginners, despite that the beginners in the previous study by Groth et al. all had a minimum of six months training before participating in the teaching study. This is also indicated by that the intermediate beginner in the current study showed similar accuracy and precision for EDV, whereas he measured ESV more accurately than the beginners in the previous study. Both the current study and the study by Groth et al. show improvement in accuracy but not precision after teaching. Altogether, this shows the importance of not focusing on time and number of cases only for deciding on whether a beginner is sufficiently trained.
In general, beginners’ underestimation of EDV and ESV and overestimation of LVM, as in the current study, can be related to systematic errors in defining which basal slice to include for measurements and incorrect inclusion of papillary muscle and trabeculation in mid-ventricular slices. As an example, both B1 and B2 in the current study assumed before teaching that papillary muscle and trabeculation should be included as myocardium, which is consistent with the more pronounced observed underestimation of EDV and ESV, and corresponding overestimation of LVM for these observers. This also stresses the importance of reporting measured EDV and ESV, and not EF alone as EF showed low bias (5%) despite the large underestimation of EDV (−44 and −37 ml) and ESV (−27 and −23 ml) before teaching, and reporting only on EF would thereby be misleading.
Previous studies showing improvement in CMR assessment after teaching included observers with six months to two years previous CMR experience , or previous cardiovascular or imaging knowledge . In the current study, the intermediate observer functioned as a bridge between previous studies and the complete beginners in the current study. Together with the previous studies, the current study indicates both the value of previous experience, regardless of whether this is in CMR, other imaging or cardiovascular anatomy knowledge, and the need for intensive training. Most importantly, it is obvious that physicians may not receive appropriate initial training as improvement is found also two years after initial training as shown previously . An initial short teaching session as given in most basic CMR courses is thus not sufficient and the individual differences between complete CMR beginners, the intermediate beginner and observers already with up to two years previous CMR experience are large. Therefore both adequate initial teaching and continuous experience including systematic training and above all adequate feedback in-house is needed for development of satisfactory skills in CMR evaluation. Even though CMR level II and III definitions include a time frame/number of cases for training; 3 months/150 cases and 12 months/300 cases, respectively, there are currently no criteria to measure the quality of “sufficient” training for the individual trainee .
In the current study both accuracy and precision of LV volumes were low for complete beginners without previous theoretical or clinical experience in CMR, irrespective of acceptable intra-observer variability. As shown by the intermediate beginner already before teaching, however, already a basic background in cardiac physiology and CMR analysis resulted in high reliability for both EDV (ICC: 0.177 and 0.357 vs. 0.931 for B1, B2 and IB, respectively) and ESV (ICC: 0.601 and 0.692 vs. 0.972 for B1, B2 and IB, respectively) as compared with expert observers. Whereas complete beginners demonstrated a clear improvement in volume measurements of both EDV and ESV after teaching, the intermediate beginner showed only minor changes in accuracy, being high already before teaching. However, the intermediate beginner did not improve in precision for EDV and ESV after teaching, indicating reproducibility errors. These reproducibility errors may be related to a combination of excluded basal slices, mismatch of delineation vs. trabeculation and for ESV difficulties in defining the proper lumen area considering that papillary muscle is included. All of which can be handled by focussed training after high-quality feedback with specific advice and recommendations on how to use the adjacent slices and different time frames for guidance.
For LVM, measurements rely on delineation of both endo- and epicardial borders. In the current study accuracy for LVM was low compared with accuracy for EDV and ESV, indicating challenges in delineating the epicardial border in particular. This is also supported by the higher reliability ICC values for EDV and ESV between beginners and experts. The DSC results may seem contra intuitive to this as no large differences in epicardial delineations between beginners and expert were shown, but this can be related to the inherent weakness of the DSC method as also relatively large area differences between observers for the relatively large epicardial delineations only have small impact on DSC values, i.e. a larger variability for epicardial delineations as DSC values are similar to (the smaller area of the) endocardial delineations and thus myocardial volumes and LVM may be different even though not indicated by DSC as such. Even though teaching improved beginners’ assessments of LVM, the low accuracy and precision also after teaching are unacceptable for clinical reports and research studies. Further teaching with adequate feedback from experienced observers in-house is required to maintain CMR as an accurate and precise method. The reason for why the epicardial borders may be challenging could be associated with for instance chemical shift artefacts, which if not recognised will lead to inconsistent delineations with impact on variability, as noted in the performed by the complete beginners. Teaching led the beginners to understand the chemical shift artefacts and how to delineate the epicardial border in these cases. Together with properly clinically delineated papillary muscle and trabeculation after teaching, the LVM accuracy improved, however insufficient for clinical reporting or research studies. The decreased precision for LVM by B1, however, may be related to that this observer found the endocardial border more difficult to define, particularly in end-systole where papillary muscle and trabeculation may be challenging to differ from the actual myocardial wall. As this observer challenge was identified, focussed continued training may quickly help this observer to improve. This was also indicated by that the last measurement rendered an accuracy and precision of 21 ± 22 g, i.e. an improvement compared with previous measurements, without the interaction of the expert or continued teaching.
Altogether, the beginners in the current study showed trends of both increased and decreased reliability after teaching. This may indicate individual differences between observers, and points to the fact that individualised feedback is crucial when structuring training.
For training purposes it may also be useful to perform delineations in patients without shunts or valvular disease and provide the differences in stroke volumes (SV) between the left and right ventricles as a feedback mechanism. Similarly, utilising the diastolic and systolic LVM measurements as internal validation is considered to lower variability and increase accuracy in LVM determination. As this is generally not part of basic courses, the teaching did not include this recommendation. The results also show that the beginners did not consider LVM as an internal control and this single addition may further improve measurements.
The use of medical students as complete beginners may seem exaggerated, as most physicians may have had previous cardiac imaging experience before turning towards CMR. However, considering that CMR research and cardiac delineations are sometimes being performed also by engineers with potentially little background in cardiac anatomy and imaging experience, and by students without formal CMR training, we considered it valuable to assess delineations of complete beginners also representative for these groups. Finally, a higher number of observers may increase the power of the study. Study of the outcome of continued training of the beginners in the current study, or lack of training, would indicate the accuracy over time in centres with and without formalised continued training, and most importantly presence or absence of high-quality feedback mechanisms.
It has recently been shown that also experienced observers from different centres may vary in measured volumes, particularly for basal and apical slices but also dependent on small and consistent disparities throughout the short-axis stack . A strategy to decrease inter-observer variability could be to move towards more automated delineation by employing computer algorithms, as routinely performed in nuclear medicine . This has however been challenging in CMR even though recent improvements of automatic algorithms show promise for the future . The use of automatic algorithms without previous training in delineation however leads to lower overall delineation competence. The observer may not be sufficiently knowledgeable to delineate particularly challenging cases where the automatic algorithm may fail. Basic and thorough training is thus necessary before applying the use of automatic delineation software to maintain CMR as a technique known for its high accuracy and reproducibility and low intra- and inter-observer variability for quantitative measurements.
Image contrast between blood pool and myocardium was slightly reduced as contrast agent had been administered before acquisition of cine images. This may have impact on delineation of endocardial borders for volume measurements. Also, the anatomical changes related to presence of pathology found in some of the included patients may render delineation more complex. The study population is however also because of these limitations representative of consecutive clinical patients and the results may therefore be considered adequate for clinical imaging and research studies. The number of observers may be considered small, however equivalent to previous teaching and observer variability studies, although the current study does not provide the highest number of observers among these studies. As the observers in previous studies represented more heterogeneous groups it may however be argued that the number in the current study is satisfactory.