 Research article
 Open Access
 Published:
Regression models for analyzing radiological visual grading studies – an empirical comparison
BMC Medical Imaging volume 15, Article number: 49 (2015)
Abstract
Background
For optimizing and evaluating image quality in medical imaging, one can use visual grading experiments, where observers rate some aspect of image quality on an ordinal scale. To analyze the grading data, several regression methods are available, and this study aimed at empirically comparing such techniques, in particular when including random effects in the models, which is appropriate for observers and patients.
Methods
Data were taken from a previous study where 6 observers graded or ranked in 40 patients the image quality of four imaging protocols, differing in radiation dose and image reconstruction method. The models tested included linear regression, the proportional odds model for ordinal logistic regression, the partial proportional odds model, the stereotype logistic regression model and rankorder logistic regression (for ranking data). In the first two models, random effects as well as fixed effects could be included; in the remaining three, only fixed effects.
Results
In general, the goodness of fit (AIC and McFadden’s Pseudo R^{2}) showed small differences between the models with fixed effects only. For the mixedeffects models, higher AIC and lower Pseudo R^{2} was obtained, which may be related to the different number of parameters in these models. The estimated potential for dose reduction by new image reconstruction methods varied only slightly between models.
Conclusions
The authors suggest that the most suitable approach may be to use ordinal logistic regression, which can handle ordinal data and random effects appropriately.
Background
When evaluating medical imaging methods, the most relevant performance measures of a procedure are related to its ability to produce correct answers to a diagnostic problem. This is typically done with concepts such as sensitivity, specificity and receiver operating characteristic (ROC) analysis. When developing a new method, however, it is often necessary to finetune numerous parameters that need to be specified in modern imaging equipment in order to obtain as much diagnostic information as possible at the minimum cost in radiation dose (effective dose) to the patient. In this optimization process, a common approach is to perform visual grading experiments, where a group of observers (e.g. radiologists) assess the fulfillment of certain welldefined image quality criteria using an ordinal scale [1]. As the data are given on an ordinal scale, the data analysis methods should be chosen accordingly, using techniques that are appropriate for such data. Still, a number of studies have been published where ordinal data from visual grading experiments are analyzed with ANOVA and similar linear models, although these build on assumptions of interval scale data, homoscedasticity and so forth.
In earlier publications, our group has proposed to use ordinal regression models in these situations to compare alternative imaging procedures [2]. Using such models, and an assumption of the relationship between the effective dose to the patient and the image quality, it is also possible to estimate the potential for dose reduction that may be expected when a new technique is introduced [3]. Based on an experiment where both the imaging technique and the effective dose are varied, the estimated dose reduction is obtained from the ratio between two regression coefficients in the regression equation. Since two of the experimental factors, the patient and the observer, are not interesting per se, but can be seen as samples from two underlying populations, it may be appropriate to treat them as random effects, which can also be done with ordinal regression models [4].
In addition to the most common form of ordinal regression, the proportional odds model [5], alternative approaches for analyzing ordinal data with regression models include the partial proportional odds model [6] and the stereotype logistic model [7]. These do not seem to have been applied to visual grading data before. In addition, random effects models have not been systematically compared to models with only fixed effects. Finally, it is not known to what extent the results of ordinal regression models differ from those of the simpler linear models.
Thus, the aim of the present study was to review regression models potentially suitable for analyzing visual grading studies and to empirically compare them on already available data, in particular to study the effect of including random effects in the model.
Material and methods
Data
The data used were taken from a previously published study on image quality and radiation dose in brain Computed Tomography (CT) which evaluated two new reconstruction algorithms, i.e. methods for creating images from the acquired raw data [8]. It has been suggested that new reconstruction algorithms (in particular iterative algorithms) may improve image quality to such an extent that the radiation dose to the patient may be reduced without impairing the image quality, which otherwise occurs when the radiation dose is reduced. Six neuroradiologists evaluated image quality in images acquired from 40 patients, each of whom underwent two consecutive brain CT examinations with two different effective dose levels. Images from all 80 examinations were reconstructed using four different image reconstruction methods: the traditional filtered back projection algorithm using the full dose (CTDI_{vol}) of 57 mGy (fd), which served as the reference, the same algorithm using a reduced dose of 40 mGy (rd), and two different levels of iterative reconstruction algorithms (id2 and id4), also using the reduced dose. In the visual evaluation, each observer individually graded three image quality criteria – graywhitematter discrimination (GW), basal ganglia delineation (BG) and general image quality (GQ) – using a fourgrade ordinal scale ranging from 1 (poor) to 4 (excellent). In addition, each observer ranked each set of four reconstructions, i.e. sorted the four image stacks in order from 1 (best) to 4 (worst) for each of the image quality criteria.
Thus the grading data comprises 3 image quality scores (GWscore, BGscore and GQscore) and 3 image quality ranks (GWrank, BGrank and GQrank) for each imaging protocol, observer and patient. As there were 6 observers and 40 patients, and we considered 4 imaging protocols (nd, rd, id2 and id4), the dataset consists of 6 × 40 × 4 = 960 observations. The data were stored in Stata format, and Stata 13.1 (StataCorp, College Station, TX, USA) was used for all analyses.
The ethical approval of the acquisition of data for the original publication [8] was given by the regional research ethics committee in Lund, Sweden (decision nr. 2010/594, date Nov. 11, 2010). Written informed consent was obtained from each patient before examination, and the study was performed in compliance with the Helsinki Declaration.
Analysis of absolute grading scores
In this section, different regression models will be discussed. In all models, the response variable is GWscore, which is treated as an interval scale variable. We assume that the influence of dose is best modeled via the logarithm of the dose rather than the dose itself [2]. Thus, there are five covariates in the regression models: log(CTDI), id2, id4, patient and observer, the two last of which are nominal, whereas id2 and id4 are dummy variables indicating whether an iterative reconstruction method was used.
Regression models with fixed effects
We suppose in this section that all covariates are fixed effects in the regression models. We start the analysis with the most fundamental regression model, i.e. the linear regression model, and will then discuss the logistic regression models, which are the main concern of this paper.
Linear model
In a linear regression model, it is supposed that the relationship between the dependent variable and the vector of regressors is linear; thus the model takes the following form:
where β_{i}’s are the regression coefficients, and ϵ is an error term from the population. This was achieved with the following Stata command:
Ordinal logistic regression
The ordinal logistic regression model (proportional odds model) is used when the dependent variable is ordinal. The cumulative probability of this regression model can be expressed in this form:
or
where x is the vector of covariates, β_{0i} is a parameter that depends on i, and β ' (transposed β) is the coefficient vector which is constant for all i. According to equations (2) and (3), there is only one set of coefficients (β ') in the ordinal logistic regression model, and due to the same relationship between each pair of outcome groups, the ordinal logistic model will make the parallel regression assumption [7, 9]. Since only the β_{0i} differ across values of i = 2, 3, 4, the three regression lines are all parallel. The following Stata command was used for this model:
Partial proportional odds model
In situations where the parallel regression assumption is violated, the ordinal logistic regression model is no longer an appropriate model. In this case, an alternative may be the partial proportional odds model, in which some of the β coefficients can be the same for all values of i, while others can differ (γ_{ i }). Thus, this model is represented in the following form:
or
where x and T are the covariates. This model is more difficult to interpret than the ordinal logistic regression model, since there will be many more parameters to consider and some effects might be statistically insignificant due to the increased number of parameters [6, 10]. We have used the gologit2 command in Stata for this model as follows:
Stereotype logistic model
An alternative model is to consider the response variable as categorical, rather than ordinal, i.e., we are unsure of the relevance of the ordering in the response variable in this case. Also, a multinomial logistic regression model may be suggested when the assumptions of the proportional odds model are not satisfied. Thus, the stereotype ordinal regression model can be considered as imposing ordering constraints on a multinomial model, which is a form of ordinal regression model. Unlike ordered logistic models, stereotype logistic models do not impose the proportionalodds assumption [6, 11]. A full multinomial model can be represented by:
where s = 2, 3, 4, and β_{00} ≡ 0 and β_{0} ≡ 0. In the multinomial logistic model, the number of parameter vectors to estimate is m1, where m is the number of levels in the response variable. Based on the restriction on the multinomial model by the stereotype logistic model, the number of parameter vectors is between one and min (m1, p), where p is the number of covariates [12]. Thus, replacing β_{ s } = ϕ_{ s } β, the stereotype ordinal regression model can be written as follows:
where β_{00} = ϕ_{0}≡0. This was achieved with the following Stata command:
Regression models with random effects
In this section, it is supposed that three covariates including log(CTDI), id2 and id4 are considered as fixed effects and two covariates including patient and observer are specified as crossed random effects. The basic concept of a random effects model is that the variation across entities is assumed to be random and uncorrelated with the covariates, unlike the fixed effects model. The mixed linear model as well as the mixedeffects ordered logistic regression model will be discussed to analyze the data when there are both fixed and random effects in the model.
Mixed linear model
The simplest model to analyze a data set with both fixed effects and random effects is a mixed linear model, which can be written in the following form:
where x is the model matrix for id2, id4 and nd as fixed effects, z is the model matrix for patient and observer as random effects, β is the vector of fixedeffects coefficients, b is the vector of randomeffects coefficients, and ϵ is an error term [13]. We have used the mixed command in Stata for a mixed linear model including crossed random effects as follows:
Mixedeffects ordered logistic regression
A model that can handle random effects where the response variable is ordinal is the mixedeffects ordered logistic regression [14]. In contrast to the ordinal logistic model, the model with random effects has the form:
or
where z_{ ij } refers to a vector of covariates for the random effects (patient and observer) and u_{ i } is the vector of randomeffects coefficients [14]. In Stata, the meologit command can be used for the ordinal logistic regression model with crossed random effects as follows:
Goodness of fit
The metrics used to compare the methods were the pseudo R^{2} and Akaike’s information criterion (AIC). The Pseudo R^{2}, also called McFadden’s R^{2}, [15], defined by
is one of several approximations of the R^{2} for linear regression. None of these are interpreted as the R^{2} for linear regression, and they all give different result [16]. An advantage of the McFadden R^{2}, in addition to its simple definition, is that it can be used for all models estimated by maximum likelihood. Since all models used in this study are based on maximum likelihood, the McFadden R^{2} is calculated in the same way for all models, and they can therefore be compared with respect to R^{2}. The model with the largest R^{2} is the one that best fits the data.
However, for comparing models differing in the number of parameters, AIC [17] is more suitable:
The most common alternative to AIC is the Bayesian information criterion (BIC). However, BIC takes the number of parameters (the degrees of freedom) into account in a way that makes it less appropriate than AIC for selecting between models with different number of parameters. The model with the smallest AIC value is considered to be the best [17].
Estimation of potential for dose reduction
To estimate the dose reduction (in percent) that might come about by the application of id2 and id4, we have used the technique proposed in our earlier publication [3], which relates the effect of replacing the reconstruction method to that of changing the effective dose. This involves forming the ratio between two regression coefficients and computing the confidence limits of the final expression using the delta method [18]. The required Stata commands to be applied after fitting the regression model are as follows:
Analysis of ranking data
Rankorder data differ in certain respects from grading data where each case is graded on the same absolute scale. One way of understanding ranking is to regard it as a sequence of choices. Then, there is gradually less freedom in the choice of grades, since the earlier choices constrain the available ranks for subsequent cases to those not used previously. This motivates the introduction of dedicated regression techniques for situations with rankorder data.
All regression models discussed in the previous section (including the linear model, ordinal logistic regression, partial proportional odds model, stereotype logistic model, mixed linear model and mixedeffects ordered logistic regression) can be applied to the data in which the response variable is GWrank. Besides these regression models, the rankordered logistic regression model can be an appropriate model since there is some extra information about the ranking of outcomes.
We define the response of respondent i by the vector y_{ i } = (y_{i1}, …, y_{ iJ })^{'}, where y_{ ij } denotes the rank that individual i gives to item j. Let GWrank_{ ij } = 1 represent the event that respondent i most prefers alternative j. This leads to the following expression for the probability that item j is most preferred by individual i:
where β = {β_{1}, …, β_{ J }} and β_{ J } is considered as zero for identification [19]. We have used the rologit command in Stata, which is specifically designed for ranking data, as follows:
where groupid is an identifier variable that links the alternatives. Since the default for the rologit command is that higher values represent more attractive alternatives, we have recoded the GWrank variable to have a higher value indicating better quality. In this case, the Stata output is the same as when the reverse option in rologit is used, which specifies that in the preference order, a higher number means a less attractive alternative in the original data [10].
Results
Absolute scores
The results of the different regression models for GWscore, BGscore and GQscore are presented in Tables 1, 2 and 3, respectively. The intercepts are excluded from the reported results in these tables due to different parameterization of the regression models. The analyses have been made using fixed effects models as well as fixed and random effects models, as explained in the previous section. All regression coefficients are statistically significant at the 0.01 level, except when contrasting categories 1 & 2 with category 3 (highest image quality) in the second panel with the partial proportional odds model for id2 with GWscore (Table 1), and for both id2 and id4 with BGscore and GQscore (Tables 2 and 3). The confidence intervals of the coefficients are reported in parentheses in Tables 1, 2 and 3.
In the linear model (regress), the regression equation of GWscore can be obtained using the coefficients reported in Table 1. The relationship between the covariates and the response variable is assumed to be linear, and an increase in the independent variables – i.e. increasing the effective dose as well as replacing the standard reconstruction with id2 or id4 – results in an increase in the GWscore, since the signs of all regression coefficients are positive.
In the ordinal logistic model (ologit), a log(CTDI) coefficient of 8.825 implies that a doubling of the CTDI for one of the image stacks in the comparison would lead to a huge increase by a factor of 2^{8.825} = 453.513 in the odds for a higher score for that stack. The coefficient regression for id2 and id4 are 0.966 and 1.271, respectively, and they can be interpreted to the odds being multiplied by e^{0.966} = 2.627 and e^{1.271} = 3.564, respectively, when the corresponding iterative reconstruction method is used instead of the standard method.
For the partial proportional logistic model (gologit2), the first panel contrasts GWscore = 1 with categories 2 and 3, whereas the second panel contrasts with category 4. Hence, positive coefficients indicate that higher values on the independent variable make it more likely that the respondent will be in a higher category of GWscore than the current one.
Since the stereotype model (slogit) is a type of an ordinal logistic regression model, the interpretation of its coefficients is similar to the ordinal logistic model. For the id2 and id4 variables, the odds of the highest image quality versus lowest image quality increased by a factor of e^{1.887} = 6.6 and e^{2.433} = 11.4, respectively, holding all other variables constant. As discussed in the previous section, there is another parameter in the stereotype model and that is ϕ_{s}. Since the response variable has only three categories in this case, it is supposed that ϕ_{0}≡0, ϕ_{2}≡1, and the estimate of ϕ_{1} is equal to 0.431. Since we have ϕ_{0} < ϕ_{1} < ϕ_{2}, we conclude that the stereotype logistic model confirms that the subjective assessment of the dependent variable is indeed ordered, and the groups (GWscore categories) are distinguishable.
For the mixed linear model (mixed), the regression coefficients are similar to the linear regression model with fixed effects (regress) and the only difference is that the patient and observer variables have been considered as random effects in the mixed linear model.
Also the regression coefficients of the mixedeffects ordered logistic regression (meologit) are very close to the ordinal logistic regression model (ologit). The estimates of the variance of the random intercept at the observer and patient level are 0.689 and 4.478, respectively.
The goodnessoffit statistics (AIC and Pseudo R^{2}) of all regression models are also given in Tables 1, 2 and 3. In Tables 1 and 2 the slogit model, and in Table 3 the ologit model, present the smallest AIC among all fixed effects models, although the differences are small. The gologit2 model represents the largest Pseudo R^{2} among all fixed effects models in Tables 1, 2 and 3.
The estimated potential for reduction of the CTDI settings (dose reduction) for GQscore, BGscore and GQscore are reported in Tables 1, 2 and 3, respectively. The confidence limits of the dose reductions, calculated using the delta method, are also presented. The proposed percentage of dose reduction for id2 (around 10 %, with confidence intervals around (6 %, 14 %), for GWscore) is very similar for all regression models in Table 1, except for the partial proportional odds model. This is also true for the estimated percentages of dose reduction for id4 (around 13 %, with a confidence interval around 9 %, 17 %). The results thus indicate smaller dose reductions for id2 than for id4, although the confidence intervals overlap to a large extent.
To compare the effect of id2 with id4 on the response variable, we restricted the analysis to observations using id2 or id4 and considered only one covariate (id2) in the regression models. The estimates thus obtained and their confidence intervals are reported in Table 4 for GWscore, BGscore and GQscore. It was found that the coefficients are all statistically insignificant at the 0.01 level.
Ranking data
The rankordered logistic regression model was applied with GWrank, which represents the ranked order between the four imaging protocols, as the response variable. The regression coefficients, goodnessoffit statistics and the estimates of dose reduction for linear models (fixed effects and mixed effects), ordinal logistic regression models (fixed effects and mixed effects) as well as the rankordered logistic model are reported in Table 5. All regression coefficients are statistically significant at the 0.01 level. The rankordered logistic model, which is designed specifically for analyzing rankorder data, presents the best performance among all models in terms of the goodnessoffit measures (AIC and Pseudo R^{2}). Unlike the results of GWscore, the estimated dose reduction figures for id2 (around 18 %) were greater than for id4 (around 15 %) while working with GWrank. The corresponding finding was also made for BGrank and GQrank. (Tables 6 and 7) In all cases, though, there was considerable overlap of the confidence intervals.
Discussion
In the present study, we did not find any dramatic differences in the results between the tested regression models. Overall, the goodnessoffit statistics in Tables 1, 2 and 3 were similar in magnitude for all the tested models, with the exception of the Pseudo R^{2} values for the mixed effects models (mixed and meologit), which were considerably lower than for the models with fixed effects only. This is most likely due to the different numbers of parameters in the models. However, also with AIC, which is supposed to compensate for differences in the number of fitted parameters, slightly worse fit was found for the models including random effects.
The original study using the same data [8] applied a linear mixed model, corresponding to the analysis here described by the command mixed. The findings were basically the same in the new analysis, with significant differences between the normal dose reconstructions and all other schemes, as well as significant effects of the iterative algorithms applied to reduceddose data, for all the tested image quality criteria. In this study, we also added the estimation of potential dose reductions, which is important for clinical application of the results.
As for the regression coefficients, their values from the linear models should not be directly compared with those from the logistic models, due to entirely different principles for parametrization. It may be noted, though, that the addition of random effects in the linear models (mixed vs. regress) had no effect on the coefficient estimates and hardly any on the confidence limits. Among the logistic models, the most striking finding was the fact that with gologit2, different estimates were obtained when contrasting the two best categories than when contrasting the two worst categories (second vs. first gologit2 panel in Tables 1, 2 and 3). This suggests that the proportional odds assumption may not have been appropriate for these data. To test this, the commonly recommended procedure is to apply Brant’s test [20]. Unfortunately, the Stata implementation of Brant’s test (which only works with ologit) does not allow nominal or random effects, so we were not able to carry out a formal test of this assumption. Also, when comparing the logistic models ologit and meologit, the addition of random effects had only a minute effect on the estimates. It should be kept in mind that when there are two crossed random effects in the model (in this case patient and observer), the integration method used by Stata is Laplacian integration, in which the parameter estimates are biased. In the variance components, the bias of the estimates tends to be more prominent than in the estimates of the fixed effects due to the Laplacian approximation [14].
For all the tested models (except gologit2 at the highest level), the regression coefficients had larger values for id4 than for id2, which was expected from previous knowledge about the algorithms, with id4 differing more from the standard algorithm than id2. The confidence intervals, though, overlapped to a large extent. The difference between id4 and id2 was also not significant when tested formally (Table 4).
More interesting from an application point of view are probably the estimates of potential dose reductions. Here all the regression models that summarize the different image quality levels gave similar results for the three image quality scores, with somewhat larger estimates for id4 than for id2, as expected, but widely overlapping confidence intervals. For gologit2, contrasting the highest quality levels gave smaller estimates than contrasting the lowest levels for both id4 and id2. A possible interpretation is that it will be more difficult to maintain the probability of producing images of excellent quality by applying the new reconstruction algorithms while reducing the radiation dose than to maintain the probability of producing images of clinically acceptable quality. Thus, the somewhat different results for the two levels seem, to some extent, to answer different research questions. The fact that, in general, nonsignificant results were obtained when contrasting the highest quality levels may be related both to the weaker effect at this level and to a loss of power when more parameters are estimated from the same data.
When analyzing the rankorder data (Tables 5–7), the regression model specifically designed for this type of data, rologit, yielded much better fit (lower AIC and higher Pseudo R^{2}). A surprising finding was that with the ranking data, larger effects, and thus larger dose reduction estimates, were found for id2 than for id4. The difference was even greater with rologit. However, again the two confidence intervals overlap.
Broadly speaking, the results of our comparison did not give any clearcut empirical evidence for selecting the most appropriate regression model for analyzing visual grading data in medical imaging, except for choosing rologit when analyzing rank data. Thus, the selection of model must be based on other considerations.
The use of linear models for analyzing ordinal scale data is generally discouraged in statistical textbooks. Also, on theoretical grounds, it is commonly recommended to handle variables such as patient and observer in our study as random effects, since they both represent samples from larger populations. This would speak in favor of the meologit approach when analyzing absolute scores. The greatest problem of this model appears to be the proportional odds assumption (parallel regression assumption), which may well have been violated by our data. Using instead gologit2 might resolve this problem, but at the expense of more complex results that are less straightforward to interpret. Still, there are situations where the relevant research questions may motivate this more complex model. It is more difficult to weigh the importance of handling violations of the proportional odds assumption (gologit2) against correctly controlling random effects (meologit). Also for slogit, the results are more complex and possibly difficult for an applied researcher to interpret. The main finding from slogit in our study was the confirmation of the ordinal structure that had been defined beforehand.
Conclusions
In conclusion, a number of logistic regression methods are available for handling ordinal data from visual grading experiments in medical imaging. Our study did not provide any empirical support for selecting a different regression model than the one we would recommend on theoretical grounds, i.e. the ordinal logistic regression model with mixed effects, which is appropriate for handling random effects when the response variable is ordinal. For rankorder data, the rankordered logistic regression model appears to be most appropriate, since this model can handle the rankorder data correctly and because of its better performance in terms of the goodnessoffit among the tested regression models.
Abbreviations
 AIC:

Akaike information criterion
 ANOVA:

Analysis of variance
 BG:

Basal ganglia delineation
 CT:

Computed tomography
 CTDI_{vol}:

Volume computed tomography dose index
 fd:

Full dose
 gologit2:

Generalized ordered logit/partial proportional odds
 GQ:

General image quality
 GW:

Graywhitematter discrimination
 id2:

Iterative reconstruction with noise reduction level 2
 id4:

Iterative reconstruction with noise reduction level 4
 meologit:

Mixedeffects ordered logistic regression
 ologit:

Ordinal logistic regression
 rd:

Reduced dose
 ROC:

Receiver operating characteristic
 rologit:

Rankordered logistic regression
 slogit:

Stereotype logistic regression
References
 1.
Månsson LG. Methods for the evaluation of image quality: a review. Radiat Prot Dosim. 2000;90:89–99.
 2.
Smedby Ö, Fredrikson M. Visual grading regression: analysing data from visual grading experiments with regression models. Br J Radiol. 2010;83:767–75. doi:10.1259/bjr/35254923.
 3.
Smedby Ö, Fredrikson M, De Geer J, Borgen L, Sandborg M. Quantifying the potential for dose reduction with visual grading regression. Br J Radiol. 2013;86:20110784. doi:10.1259/bjr/31197714.
 4.
Smedby Ö, Fredrikson M, De Geer J, Sandborg M. Visual grading regression with random effects. Proc SPIE. 2012;8318:831805. doi:10.1117/12.913650.
 5.
McCullagh P. Regression models for ordinal data. J R Stat Soc B. 1980;42:109–42.
 6.
Williams R. Generalized ordered logit/ partial proportional odds models for ordinal dependent variables. Stata J. 2006;6(1):58–82.
 7.
Anderson JA. Regression and ordered categorical variables. J R Stat Soc Ser B. 1984;46:1–40.
 8.
Löve Á, Siemund R, Höglund P, Van Westen D, Stenberg L, Petersen C, et al. Hybrid iterative reconstruction algorithm in brain CT: a radiation dose reduction and image quality assessment study. Acta Radiol. 2014;55(2):208–17. doi:10.1177/0284185113494980.
 9.
Agresti A. Analysis of ordinal categorical data. 2nd ed. Hoboken, NJ: Wiley; 2010. p. 9–24.
 10.
Long JS, Freese J. Regression models for categorical dependent variables using stata. 2nd ed. College Station, TX, USA: Stata Press; 2003.
 11.
Lunt M. Stereotype ordinal regression. Stata Tech Bull. 2001;61:12–8.
 12.
Ahn J, Mukherjee B, Banerjee M, Cooney KA. Bayesian inference for the stereotype regression model: application to a case–control study of prostate cancer. Stat Med. 2009;28:3139–57. doi:10.1002/sim.3693.
 13.
Laird NM, Ware JH. Randomeffects models for longitudinal data. Biometrics. 1982;38:963–74.
 14.
RabeHesketh S, Skrondal A. Multilevel and longitudinal modeling using stata. 3rd ed. College Station, TX, USA: Stata Press; 2012. p. 575–90.
 15.
McFadden D. Conditional logit analysis of qualitative choice behavior. In: Zarembka P, editor. Frontiers in econometrics. New York: Academic; 1974. p. 105–42.
 16.
Hardin J, Hilbe J. Generalized linear models and extensions. College Station, TX, USA: Stata Press; 2001.
 17.
Akaike H. A new look at the statistical model identification. Autom Contr IEEE Trans. 1974;19(6):716–23. doi:10.1109/tac.1974.1100705.
 18.
Oehlert GW. A note on the delta method. Am Stat. 1992;46:27–9.
 19.
Hair JF, Black JWC, Babin BJ, Anderson RE. Multivariate data analysis. 7th ed. Upper Saddle River, NJ: Pearson; 2010.
 20.
Brant R. Assessing proportionality in the proportional odds model for ordinal logistic regression. Biometrics. 1990;46:1171–8.
Acknowledgements
No specific funding was received for this study.
Author information
Affiliations
Corresponding author
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
AL designed and carried out the visual grading experiments. ÖS designed the current study and proposed the statistical methodology. SES performed the statistical analysis under the supervision of MF. SES prepared the first draft of the manuscript, and all authors took part in its final formulation.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Saffari, S.E., Löve, Á., Fredrikson, M. et al. Regression models for analyzing radiological visual grading studies – an empirical comparison. BMC Med Imaging 15, 49 (2015). https://doi.org/10.1186/s128800150083y
Received:
Accepted:
Published:
Keywords
 Image quality
 Visual grading
 Ordinal data
 Regression models
 Fixed effects
 Random effects