Reliability of radiographic measurements for acute distal radius fractures

Background The management of distal radial fractures is guided by the interpretation of radiographic findings. The aim of this investigation was to determine the intra- and inter-observer reliability of eight traditionally reported anatomic radiographic parameters in adults with an acute distal radius fracture. Methods Five observers participated. All were routinely involved in making treatment decisions based on distal radius fracture radiographs. Observers performed independent repeated measurements on 30 radiographs for eight anatomical parameters: dorsal shift (mm), intra-articular gap (mm), intra-articular step (mm), palmar tilt (degrees), radial angle (degrees), radial height (mm), radial shift (mm), ulnar variance (mm). Intraclass correlation coefficients (ICCs) and the magnitude of retest errors were calculated. Results Measurement reliability was summarised as high (ICC > 0.80), moderate (0.60–0.80) or low (<0.60). Intra-observer reliability was high for dorsal shift and palmar tilt; moderate for radial angle, radial height, ulnar variance and radial shift; and low for intra-articular gap and step. Inter-observer reliability was high for palmar tilt; moderate for dorsal shift, ulnar variance, radial angle and radial height; and low for radial shift, intra-articular gap and step. Error magnitude (95 % confidence interval) was within 1–2 mm for intra-articular gap and step, 2–4 mm for ulnar variance, 4–6 mm for radial shift, dorsal shift and radial height, and 6–8° for radial angle and palmar tilt. Conclusions Based on previous reports of critical values for palmar tilt, ulnar variance and radial angle, error margins appear small enough for measurements to be useful in guiding treatment decisions. Our findings indicate that clinicians cannot reliably measure values ≤1 mm for intra-articular gap and step when interpreting radiographic parameters using the standardised methods investigated in this study. As a guide for treatment selection, palmar tilt, ulnar variance and radial angle measurements may be useful, but intra-articular gap and step appear unreliable. Electronic supplementary material The online version of this article (doi:10.1186/s12880-016-0147-7) contains supplementary material, which is available to authorized users.


Background
Comprising 2.5 % of all Emergency Department (ED) presentations, fracture of the distal radius is the most common skeletal fracture type [1]. It occurs in approximately 10 % of Caucasian women over 65 years [2]. After cast removal, immediate functional limitations include loss of strength (particularly grip) and range of movement [3,4]. A decade following fracture, ongoing pain and reduced function of the wrist and hand can still occur with heavy tasks [5].
Radiographs of the distal radius are used for diagnosis, to guide treatment choices, assess fracture reduction and monitor healing. There are no standardised, evidencebased methods for interpreting radiographic parameters. Eight anatomic parameters of distal radius fracture have been described. These are dorsal shift (mm), intra-articular gap (mm), intra-articular step (mm), palmar tilt (degrees), radial angle (degrees), radial height (mm), radial shift (mm), and ulnar variance (mm) [6]. Relationships have been described between functional outcome and the anatomical parameters of intra-articular gap and step [7][8][9], dorsal [10][11][12] and palmar tilt [13], radial angle [14,15], radial height [10,16,17], radial shift [18] and ulnar variance [19,20]. No such relationships have been described for dorsal shift suggesting it may have no clinical utility, may not be adequately reliable and/or its close correlation with dorsal tilt [21] renders this measurement less important. This investigation explored the reliability of these eight parameters in preparation for a study of their utility in guiding treatment decisions.
We assessed the error associated with radiographic interpretation of acute distal radius fractures and whether the errors associated with measurements were small enough for measurements to be used confidently in fracture management. We investigated the intra-and inter-observer reliability of eight anatomic parameters in skeletally mature patients with an acute distal radius fracture using digitised radiographs. This investigation extends and updates the work of Kreder et al. [6] by using a larger sample of radiographs (30 acute fractures) and the computerised images and measurement procedures used in current practice. The majority of choices regarding treatment for distal radius fractures occur in the acute period. This investigation utilized radiographs of acute distal radius fractures in contrast to the healed distal fracture radiographs utilized by Kreder et al. [6].

Participants: selection of radiographs
Posteroanterior (PA) and lateral wrist radiographs of all patients with distal radius fractures presenting to a large outer metropolitan ED in Victoria, Australia during the period July 2009 to January 2010 were retrospectively selected for review. Standardised positioning of neutral forearm rotation was adopted for the PA and lateral views. Inclusion criteria for radiographs were skeletal maturity, fracture within 3 cm of the distal end of the radius, and presenting to the ED within seven days of fracture. Exclusion criteria were pathological fracture and evidence of previous distal radius fracture on the affected side. Radiographs meeting the inclusion/exclusion criteria were assembled and stratified by fracture deformity to either Group A (mild deformity) or Group B (severe deformity) based on decision rules defining estimates of severity (Table 1). In the absence of guidelines, decisions regarding cut points separating mild from severe were arbitrary and intended only to enable the spectrum of injury to be appropriately represented in the assembled targets. Fifteen radiographs from each group were randomly selected based on a computer generated sequence. The sample size of 30 repeated measurements for each observer was chosen as difference scores for repeated measures in samples of 30 or more are likely to assume a normal distribution [22].

Participants: selection of assessors
To accommodate the influence of site specific practices, observers were invited from two independent health networks. Health professionals were invited to participate if their role included making treatment decisions based on radiographic images of upper limb fractures. Invitations were sent to ED, orthopaedic and radiology staff, and upper limb specialists. Potentially eligible professional groups included orthopaedic consultants and registrars, radiologists, ED consultants and registrars, and advanced practice musculoskeletal physiotherapists. A total of 10 observers (five from each health network) meeting these criteria were invited to participate. Ensuring at least two observers from each health network, the first five observers to consent were enrolled in the investigation.

Observer training
Prior to commencing this study, each observer completed a self-directed tutorial that provided standardised instructions and examples illustrating measurement techniques for assessing each of the eight parameters. Observers were then given three wrist radiographs of acute distal radius fractures and asked to measure each of the eight anatomical parameters. The standardised method developed by Kreder et al. [6] for measuring these eight anatomic parameters at the distal radius was followed ( Fig. 1a, b, c) [6]. Observers utilised picture archiving and communication system (PACS) computerised images through the Synapse [23] display system that includes measurement calibration. Two views, PA and lateral, were utilised to obtain measurements for each of the eight parameters. Observers were asked to save all working lines used in computations. Each observer was then provided with written and verbal feedback from the principal investigator on departures from standardised measurement techniques. Participants were asked to repeat measurements where incorrect technique was noted and again save measurement decision rules for review and feedback from the principal investigator. Observer training was completed over a two week period. The purpose of this preparation was to identify and minimise sources of systematic and random error in reading images.

Measurement parameters
The five observers were asked to measure the following parameters for each of the 30 radiographs: dorsal tilt (degrees), intra-articular gap (mm), intra-articular step (mm), palmar tilt (degrees), radial angle (degrees), radial height (mm), radial shift (mm), and ulnar variance (mm) using methods described in the observer training tutorials.   Fig. 1 a Posterioanterior measurement guidelines as described in Kreder et al. [6]. b Lateral measurement guidelines as described in Kreder et al. [6]. c Step and gap measurement as described in Kreder et al. [6]. a RA, radial angle; RL, radial length; UV, ulnar variance; RS, radial shift. 1) This line represents the long axis of the radius. The center of the radius shaft is determined at 3 cm and 5 cm below the mid-region of the proximal lunate articular surface. 2) A line perpendicular to the center long axis of the radius is drawn at the level of the most distal aspect of the radial articular surface. Step-off at the distal radius articular cortical margin is measured by drawing lines perpendicular with the central long axis of the radius from the most distal margin of each side of the cortical discontinuity. 2) Gap deformity is measured by dropping lines that are parallel from the central long axis of the radius from the most distal margin of each side of the cortical deformity. The gap distance is measured along a line perpendicular to the central long axis of the radius.

Measurements
On the first measurement occasion, observers were presented, in random sequence, with 30 fully de-identified radiographs with no unique identifying features. No earlier than two weeks and no later than three weeks later, observers were presented with the same set of radiographs, again in random sequence and without access to measurements taken on the first measurement occasion. To reduce the potential impact of measurement fatigue, observers were instructed to disperse their measurements over a two week period.

Statistics
Data were analysed using the recommendations by Rankin and Stokes [24] (correlational indices) and Bland and Altman [25] (metricated error estimates). The ICC quantifies the relationship between two variables with r = 1 indicating perfect agreement and r = 0 indicating no agreement.

Intra-observer reliability
Intra-observer ICCs were calculated for each of the eight anatomical parameters on data produced from a twoway repeated analysis of variance (ANOVA). The test retest values for each observer were compared for each anatomical parameter. Using equations provided by Fleiss [26] for repeated measurements by the same observer, the ICC(1, 1) was calculated using the formula: where k is the number of measurements, and mean squares (MS) of variance estimates were obtained from ANOVA: BMS (between-subjects variance) and WMS (within subjects variance). Bland and Altman [25] analysis was used to quantify agreement between measurements made by the same observer; the difference between two measurements was plotted against the average of the two measurements. The 95 % confidence intervals (CIs) around the mean differences were calculated using the standard errors in estimates of the mean and a t multiplier appropriate for the sample size (2.05) [25].

Inter-observer reliability
The first set of measurements of the eight anatomical parameters for each of the 30 radiographs were used to calculate the inter-observer reliability. The analyses were repeated for fractures dichotomised by severity of deformity to assess whether reliability changed with fracture severity. Variance estimates were derived from a one-way repeated ANOVA. The ICC(3,1) was calculated based on recommendations and equations by Shrout and Fleiss [27] for inter-observer reliability: The ICC(3,1) was chosen as each radiograph in the current investigation was rated by each of the same k observers who were the only observers of interest. As the observers for this investigation were selected from the general population, ICC(1,1) could have been used, however the more conservative equation was chosen. Bland and Altman [25] analysis was again applied to quantify inter-observer agreement.
As this investigation involved exploring agreement between more than two fixed observers, a representative average of the reliability between pairs within the five observers was calculated using an overall concordance correlation coefficient (OCCC) based on recommendations and equations by Barnhart et al. [28]. The OCCC provides an overall correlation that takes into account the correlation between individual pairs of observers.
A number of investigations have recommended conservative management for distal radius fractures when intra-articular gap or step is less than 1 mm [7][8][9]. Intraarticular gap and step measurements were therefore converted to a dichotomy of less than or equal to 1 mm or greater than 1 mm and pairwise inter-observer agreement values (kappa) were calculated. Further pairwise interobserver agreement values (kappa) were calculated when intra-articular gap and step measurements were converted to a dichotomy of presence (any gap or step recorded) or absence (zero gap or zero step recorded) of intra-articular step or gap. Data from the first set of measurements was used for the conversions to dichotomies.

Results
The professional roles of the five observers who reviewed radiographs were orthopaedic surgeon (upper limb), orthopaedic registrar, ED consultant, ED primary care advanced practice musculoskeletal physiotherapist and radiologist.
Measurement reliability was summarised as high (ICC > 0.80), moderate (0.60-0.80) or low (<0.60). Intra-observer reliability was high for dorsal shift and palmar tilt; moderate for radial angle, radial height, ulnar variance and radial shift; and low for intra-articular gap and step. Interobserver reliability was high for palmar tilt; moderate for dorsal shift, ulnar variance, radial angle and radial height; and low for radial shift, intra-articular gap and step ( Table 2). OCCC values ranged from 0.11 for intraarticular gap to 0.94 for palmar tilt (Table 2). ICC values appeared higher in the current investigation compared with Kreder et al. [6] for all parameters except radial shift and ulnar variance ( Table 2). Error magnitude (95 % confidence interval) was within 1-2 mm for intra-articular gap and step, 2-4 mm for ulnar variance, 4-6 mm for radial shift, dorsal shift and radial height, and 6-8°for radial angle and palmar tilt (Table 3).
Dichotomising intra-articular gap and step measurements as above or below 1 mm produced pairwise interobserver agreement values ranging from −0.06 to 0.52 and −0.11 to 0.43 respectively (Tables 4 and 5). Dichotomising intra-articular gap and step measurements by the presence or absence of intra-articular involvement produced pairwise inter-observer agreement values ranging from −0.07 to 0.67 and 0 to 0.63 respectively (Tables 4  and 5).
Dichotomizing data based on fracture severity resulted in a systematic improvement in inter-observer reliability values for more severe fractures with all measurements except dorsal shift (Table 6). No systematic reductions in error were seen for more severe fractures when calculations of standard error of measurement (inter-observer) were performed (Table 6).
Stratifying data based on professional subgroups e.g. isolating the analysis to the orthopaedic surgeon and radiologist, did not result in systematically higher reliability values. The highest correlations were obtained when data for all five observers were included in analysis.

Bland and Altman
Bland and Altman [25]

Discussion
Distal radius fractures are typically managed nonoperatively with cast immobilisation or surgically with either percutaneous pinning (Kirscher wires), external fixation or internal fixation [29]. However, the evidence behind treatment choices based on deformation and radiographic parameters is limited. Best treatment for the various types of distal radius fracture would ideally include a reliable, standardised, evidence-based method of classifying distal radius fractures and unambiguous decision guidelines for treatment.
The rationale for this investigation was to quantify the intra-and inter-observer reliability of eight traditionally reported anatomic parameters in skeletally mature patients with an acute distal radius fracture using PACS computerised images and display systems (Synapse [23]). Bland and Altman [25] graphs (Figs. 2 and 3) indicate no clear relationship between an individual measurement and the magnitude of error in measurement. Consequently, in clinical practice, errors associated with these measurements  are better estimated using degrees or millimetres of error (e.g. +/− 4°) than error expressed as a percentage of the range (e.g. 10 % of range). Despite adopting standardised measurement techniques and observer training, the intra-and inter-observer consistency when applying these measures varied greatly for the eight anatomic parameters. Intra-observer ICC values appeared higher than inter-observer for all anatomic parameters except palmar tilt and may indicate the potential for additional training to remediate inconsistencies between clinicians for measurements.
Kreder et al. [6] published the results of intra-and inter-observer consistency in assessing these eight anatomic parameters with repeated assessments at 0 and 2-4 weeks of six radiographs of healed fractures conducted by 16 observers. Printed films were assessed on flat view boxes and measured using protractors and rulers. Limitations of the Kreder et al. [6] investigation were the small sample of radiographs and that radiographs were of healed fractures.
For intra-observer reliability in the current investigation, ICC values were found to be above 0.80 for palmar tilt and dorsal shift ( Table 2). Only one parameter, palmar tilt, was associated with an inter-observer ICC value above 0.80. Unlike the current investigation, Kreder et al. [6] found ulnar variance was the only parameter to have an intra-and inter-observer ICC value above 0.80.
Comparison of error margins in millimeters or degrees (e.g. using the SEM in Table 3) is preferable to comparison of ICC values. This is because the magnitude of ICCs is affected by the range of raw scores included in the computation. This effect is referred to as attenuation of range and has the consequence that the ICC will increase as the variance in raw scores increases despite the same absolute differences (error) in repeated measurements. We were unable to determine the extent to which the higher ICCs obtained in the current investigation were a consequence of a larger range of raw scores as we did not have the variance estimates for Kreder et al.'s [6] data. However, on examination of the range of raw scores for the data analysed in both studies (Table 3), it is possible that attenuation of range might explain at least some of the observed differences.
It is difficult to be unequivocally confident that the use of computerised images and measurement procedures facilitates additional accuracy. If we were studying a similar spectrum of measurements, some differences in study design may account for observed differences. These include the digital methods we employed, our larger number of radiographs (30 versus 6) and that we studied acute fractures while Kreder et al. [6] studied healed fractures. This may have afforded us better visibility of the cortical disruption. The use of acute fracture images that mirror authentic practice confirms Kreder et al.'s [6] findings and extends the validity of claims regarding measurement utility across a representative spectrum of deformity.
Our classification of radiographs into Group 1 (mild deformity) and 2 (severe deformity) was undertaken to enable the spectrum of mild to severe deformity to be represented in the radiographs. While the accuracy of this step cannot be defended based on our analysis of the reliability of radiographic measurements, the range of obtained measurements in this study were comparable to the range of measurements obtained in the study by Kreder et al. [6], suggesting some success in capturing the spectrum of severity.

Intra-articular gap and step
A number of investigations have recommended conservative management for distal radius fractures when  intra-articular gap or step is less than 1 mm; accelerated development of arthritis, increased severity of degenerative changes and poor functional outcome has been linked with intra-articular gap or step greater than 1 mm [7][8][9]. Intra-articular gap and step measurements in both the current investigation and previous literature (Tables 2, 4 and 5) were associated with low intra-and inter-reliability ICC values. Given the poor reliability for assessing intra-articular gap or step, we question the suitability of using these radiographic interpretations as criteria for guiding treatment choices. It is possible that additional training in measurement technique might improve the accuracy of these measurements and the costbenefits of computerised tomography for improving reliability warrants exploration.

Dorsal and palmar tilt
It has been argued that functional outcomes are significantly affected when dorsal tilt (negative palmar tilt) exceeds 10°or 12° [10][11][12] or palmar tilt exceeds 25° [13]. Allowing for error in estimates, dorsal tilt would need to be less than 2.2°to be confident that in 95 % of cases it is actually less than 10°. Error estimates indicate that palmar tilt measurement would need to be less than 17.2°to be confident that in 95 % of cases it is less than 25°.

Ulnar variance
Positive ulnar variance greater than 3 mm has been reported to negatively impact functional ability [19,20]. Allowing for error in estimates we would need to see no more than 0.8 mm of positive ulnar variance to be confident that true ulnar variance is no more than 3 mm.

Radial angle
Radial angle generally reduces with displaced distal radius fractures and a radial angle of less than 15°has been used to indicate operative management [14,15]. We would need to see more than 21.8°radial angle to be confident that in 95 % of cases we have more than 15°.

Radial height
A reduction in radial height of 3-6 mm has been linked with a decline in functional outcome [16,17]. Error estimates and a 95 % CI upper limit of 5.2 mm raise questions about the utility of this measure using the standardised methods described.

Dorsal shift and radial shift
There is limited information in the literature linking the measurements of dorsal shift and radial shift with functional outcomes raising questions around the importance of these anatomical parameters for guiding treatment decisions. The current investigation indicates that dorsal shift can be reliably measured [inter-observer ICC value (0.75)] and therefore exploration of its relationship with functional outcomes is warranted.

Conclusions
In summary, when interpreting computerised images of acute distal radius fractures, reliability measures and error margins from this investigation support the use of palmar tilt, radial angle and ulnar variance measurements for guiding treatment choices. However, consideration needs to be given to error margins when using these measurements to guide treatment choices. Reliability measures and error margins indicate that intraarticular gap and step cannot reliably be used to guide treatment choices for acute distal radius fractures when using the methods for interpreting radiographic parameters investigated in this study. This study did not investigate the reliability of the scan itself, and this warrants further investigation. The next step from this investigation is to use evidencebased methods to develop decision rules for treatment guidelines following acute distal radius fracture. It is known that clinicians do not routinely measure all eight anatomical parameters in clinical practice. Further investigation is required to quantify whether there is consistency and agreement with the anatomical parameters that clinicians deem important for decision making for acute distal radius fractures.  Author details