WHO MONICA Project e-publications, No. 20
February 2000
Kari Kuulasmaa^{1}, Annette Dobson^{2}, Hugh Tunstall-Pedoe^{3}, Stephen Fortmann^{4}, Susana Sans^{5}, Hanna Tolonen^{1}, Alun Evans^{6}, Marco Ferrario^{7}, Jaakko Tuomilehto^{1} for the WHO MONICA Project^{8}
^{1} Department of Epidemiology and Health Promotion, National Public
Health Institute (KTL), Helsinki, Finland
^{2} Department of Statistics, University of Newcastle, New South Wales,
Australia
^{3} Cardiovascular Epidemiology Unit, (MONICA Quality Control Centre
for Event Registration), University of Dundee, Ninewells Hospital and Medical
School, Dundee, Scotland, U.K.
^{4} Stanford Center for Research in Disease Prevention, Stanford
University, USA
^{5} Institute of Health Studies, Department of Health and Social
Security, Barcelona, Spain
^{6} Department of Epidemiology and Public Health, The Queen's
University of Belfast, UK
^{7} Faculty of Medicine, University of Milan - Bicocca at Monza, Italy
^{8} Annex: Sites and key personnel of the WHO
MONICA Project
This document is the methodological appendix to the paper titled "Estimating the contribution of changes in classical risk factors to the trends in coronary event rates in 38 WHO MONICA Project populations" published in the Lancet in 2000 [1]. It covers three topics:
The association between the trends in event rates and trends in risk scores (or in levels of individual risk factors) was calculated using the linear regression model:
where y_{i}'s, i = 1,….n, are the trends in event rates in the n populations, x_{i}'s are the trends in risk scores, and 's are independent random variables with normal distributions where the W_{i}'s are quality weights of the populations. Instead of y_{i} and x_{i} we observe
and
where and are independent random variables with distributions and respectively. We consider the terms and to be known, having the values of the variances of the trend estimates. The regression model now becomes:
which can be written as
where the 's are independent random variables with normal distributions , where .
The parameters were estimated using an iterative reweighting procedure [2]. The estimation procedure weights the populations properly, but does not correct for the regression dilution caused by the errors in the explanatory variables.
To characterize the contribution of the three components of variation for the observed trends in event rates (i.e the Y_{i}'s), we define four sums of squares:
For the proportion of the variation in trends in event rates explained by the trends in risk scores, the traditional statistic R^{2} = SM/TV is misleading because it treats the known variances of the trend estimates as unexplained variation. For example, if the "true" trends in event rates were fully explained by the "true" trends in risk scores, SM/TV would be less than one because of the statistical error in the trend estimates. A more justifiable measure for the proportion of variation explained by the model is SM/(SM+SE), which is the proportion of the variance explained by the model after excluding the contribution of the known variances of trend estimates from the denominator [3].
The correction of the regression dilution bias of the coefficient would not affect the total residual variation, but it would increase the 's on the expense of (c.f. the error term of the regression model in Section 2.1). Therefore correction of the regression bias would also be expected to increase SM/(SM+SE).
In this section we derive the weights for age-standardizing trends in risk factors in the regression analysis.
For both sexes in each population, the trend in risk scores was estimated separately for each age group, and then age-standardized using a weighted mean of the age group-specific trends:
X = (u_{1}X_{1} + .... + u_{k}X_{k})/u, |
(1) |
where
To make the age-standardized trends in risk factors and event rates comparable, the trends in event rates should ideally be age-standardized in the same way as the trends in risk factor levels [4]. In practice, however, it is preferrable to estimate the trends in event rates from the age-standardized annual rates, because the numbers of events in the young age groups are very small, making the estimation of the trends in the younger age groups unreliable. The age-standardized event rate, for any given year, is:
r = (w_{1}r_{1} + .... + w_{k}r_{k})/w, |
(2) |
where
Let Z_{1}, .... , Z_{k }denote the trends in age group-specific event rates and let Z denote the age-standardized trends using formula (1), i.e. Z = (u_{1}Z_{1} + .... + u_{k}Z_{k})/u. Also let Y denote the trend in the age-standardized event rates defined in (2). We assume that the trends in event rates are exponential, and Y and the Z_{j }are the change rates, so r = exp(a + Yt) and r_{j} = exp(b_{j }+_{ }Z_{j}t) where t denotes time, and therefore
Y = r'/r and Z_{j} = r_{j}'/r_{j}, |
(3) |
where r' and r_{j}' denote the derivatives of the event rates with respect to time (as in reference [1]).
Our aim now is to find a relationship between the two sets of weights, u_{1}, ...., u_{k }and w_{1}, ...., w_{k}, such that age-standardized trends in event rates Y and Z, which were calculated using the two different approaches, are similar. We need to assume that the age group-specific event rates r_{j} are approximately proportional to fixed constants c_{j} ; specifically, that there is a function s of time such that
r_{j} c_{j}s for all j = 1,..,k. |
(4) |
Note that equality in (4) would not be appropriate because it would imply that the age group-specific trends Z_{j} were all equal. However, the assumption of the approximate equality is feasible for our purpose because any differences in age-specific trends Z_{j} are very small compared with the differences between the age group-specific event rates r_{j}. Assumption (4) implies
r = (w_{1}r_{1} + .... + w_{k}r_{k})/w s(w_{1}c_{1} + .... + w_{k}c_{k})/w,
so that
s/w r/(w_{1}c_{1} + .... + w_{k}c_{k}). |
(5) |
Using equations (3), (2), (4) and (5), we obtain
Y = r'/r = (w_{1}r_{1}' + .... + w_{k}r_{k}')/wr = (w_{1}Z_{1}r_{1} + .... + w_{k}Z_{k}r_{k})/wr
(w_{1}Z_{1}c_{1}s + .... + w_{k}Z_{k}c_{k}s)/wr = s(w_{1}c_{1}Z_{1} + .... + w_{k}c_{k}Z_{k})/wr
(w_{1}c_{1}Z_{1} + .... + w_{k}c_{k}Z_{k})/(w_{1}c_{1} + .... + w_{k}c_{k}).
If we define u_{j} = w_{j}c_{j}, the last expression is equal to Z. Therefore, the two approaches for calculating age-standardized trends are approximately equal if the weights u_{j} and w_{j} are related by u_{j} = w_{j}c_{j}.
In the populations of the WHO MONICA Project, the age group-specific rates for coronary events are, on average, proportional to the coefficients c_{j} specified in Table 1.
Age group | 35-39 | 40-44 | 45-49 | 50-54 | 55-59 | 60-64 |
---|---|---|---|---|---|---|
Men | 1 | 2 | 4 | 7 | 11 | 16 |
Women | 1 | 2 | 4 | 8 | 16 | 29 |
The annual event rates were standardized to the world population shown in Table 2 [5].
Age group | 35-39 | 40-44 | 45-49 | 50-54 | 55-59 | 60-64 |
---|---|---|---|---|---|---|
w_{j} | 6 | 6 | 6 | 5 | 4 | 4 |
Multiplying c_{j} and w_{j} of Tables 1 and 2 gives the weights u_{j} for the second approach, as shown in Table 3.
Age group | 35-39 | 40-44 | 45-49 | 50-54 | 55-59 | 60-64 |
---|---|---|---|---|---|---|
Men | 6 | 12 | 24 | 35 | 44 | 64 |
Women | 6 | 12 | 24 | 40 | 64 | 116 |
When these are summarized to ten-year age groups, they are are similar to the weights 1, 3 and 7 which have been used for age-standardizing case fatality rates in the WHO MONICA Project [6], as shown in Table 4.
Age group | 35-44 | 45-54 | 55-64 | Total | |
---|---|---|---|---|---|
u_{j} | Men | 18 (10 %) | 59 (32 %) | 108 (58 %) | 185 (100%) |
Women | 18 (7 %) | 64 (24 %) | 180 (69 %) | 262 (100%) | |
MONICA event weights | 1 (9 %) | 3 (27 %) | 7 (64 %) | 11 (100%) |
Therefore, weights of 1, 3 and 7 (in Table 4) were used for age-standardizing the trends in risk scores and the individual risk factors for the regression analyses against trends in age-standardized event rates (calculated using weights w_{j} in Table 2).
The overall quality score, which was used for weighting the data in the regression analyses in reference [1], has values between zero and two. If the score is two, no problems were identified in the quality of the data for a population, whereas a score of zero indicates major concern about the data quality. The overall quality score was derived from the quality scores of the individual data components. The values of the overall quality score and its components are shown in Table A1. The definition of the overall quality score is included in the following description of the columns of Table A1:
The MONICA Centres are funded predominantly by regional and national governments, research councils, and research charities. Coordination is the responsibility of the World Health Organization (WHO), assisted by local fund raising for congresses and workshops. WHO also supports the MONICA Data Centre (MDC) in Helsinki. Not covered by this general description is the ongoing generous support of the MDC by the National Public Health Institute of Finland, and a contribution to WHO from the National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland, USA for support of the MDC. The completion of the MONICA Project is generously assisted through a Concerted Action Grant from the European Community. Likewise appreciated are grants from ASTRA Hässle AB, Sweden, Hoechst AG, Germany, Hoffmann-La Roche AG, Switzerland, the Institut de Recherches Internationales Servier (IRIS), France, and Merck & Co. Inc., New Jersey, USA, to support data analysis and preparation of publications.