Data with Time-Dependent Covariates

Since the maximum likelihood method is particularly effective at handling time-dependent covariates, let’s look at another example with three covariates that change over time. The sample consists of 301 male biochemists who received their doctorates in 1956 or 1963. At some point in their careers, all of these biochemists had jobs as assistant professors at graduate departments in the U.S. The event of interest is a promotion to associate professor. The biochemists were followed for a maximum of 10 years after beginning their assistant professorships. For a complete description of the data and its sources, see Long, Allison, and McGinnis (1993). That article focuses on comparisons of men and women, but here I look only at men in order to simplify the analysis. The data set includes the following variables:

DURyears from beginning of job to promotion or censoring.
EVENThas a value of 1 if the person was promoted; otherwise, EVENT has a value of 0.
UNDGRADselectivity of undergraduate institution (ranges from 1 to 7).
PHDMEDhas a value of 1 if the person received his Ph.D. from a medical school; otherwise, PHMED has a value of 0.
PHDPRESTa measure of prestige of the person’s Ph.D. institution (ranges from 0.92 to 4.62).
ART1-ART10the cumulative number of articles the person published in each of the 10 years.
CIT1-CIT10the number of citations in each of the 10 years to all previous articles.
PREST1a measure of prestige of the person’s first employing institution (ranges from 0.65 to 4.6).
PREST2prestige of the person’s second employing institution (coded as missing for those who did not change employers). No one had more than two employers during the period of observation.
JOBTIMEyear of employer change, measured from start of assistant professorship (coded as missing for those who did not change employers).

The covariates describing the biochemists’ graduate and undergraduate institutions are fixed over time, but article counts, citation counts, and employer prestige all vary with time. The citation counts, taken from Science Citation Index (Institute for Scientific Information), are sometimes interpreted as a measure of the quality of a scientist’s published work, but it may be more appropriate to regard them as a measure of impact on the work of other scientists.

The first and most complicated step is to convert the file of 301 persons into a file of person-years. The following DATA step accomplishes that task:

data rankyrs;
   infile 'c:rank.dat';
   input dur event undgrad phdmed phdprest art1-art10
         cit1-cit10 prest1 prest2 jobtime;
   array arts(*) art1-art10;
   array cits(*) cit1-cit10;
   if jobtime=. then jobtime=11;
   do year=1 to dur;
      if year=dur and event=1 then promo=1;
         else promo=0;
      if year ge jobtime then prestige=prest2;
         else prestige=prest1;
      articles=arts(year);
      citation=cits(year);
      year2=year*year;
      output;
   end;
run;

This DATA step has the same basic structure as the one for the job duration data. The DO loop creates and outputs a record for each person-year, for a total of 1,741 person-years. Within the DO loop, the dependent variable (PROMO) has a value of 1 if a promotion occurred in that person-year; otherwise, PROMO has a value of 0.

What’s new is that the multiple values of each time-dependent covariate must be read into a single variable for each of the person-years. For the article and citation variables, which change every year, we create arrays that enable us to refer to, say, articles in year 3 as ARTS(3). For the two job prestige variables, we must test to see whether the current year (in the DO loop) is greater than or equal to the year in which a change occurred. If it is, we assign the later value to the variable PRESTIGE; otherwise, PRESTIGE is assigned the earlier value. Note that for this to work, we recode the JOBTIME variable so that missing values (for people who didn’t have a second employer) are recoded as 11. That way the DO loop index, which has a maximum value of 10, never equals or exceeds this value. Finally, we define YEAR2 equal to YEAR squared so that we can fit a quadratic function of time.

Now we can proceed to estimate regression models. For academic promotions, which usually take effect at the beginning of an academic year, it makes sense to think of time as being truly discrete. A logit model, then, seems entirely appropriate. Using PROC LOGISTIC, we specify the model as follows:

proc logistic descending data=rankyrs;
   model promo=undgrad phdmed phdprest articles citation
         prestige year year2;
run;

The DESCENDING option in the PROC LOGISTIC statement forces PROC LOGISTIC to model the probability of a 1 rather than the probability of a 0.

Output 7.4. Estimates of Logit Model for Academic Promotions
            Parameter Standard    Wald       Pr >   Standardized    Odds
Variable DF  Estimate   Error  Chi-Square Chi-Square  Estimate     Ratio

INTERCPT 1    -8.4767   0.7758   119.3807     0.0001           .   0.000
UNDGRAD  1     0.1947   0.0636     9.3785     0.0022    0.146561   1.215
PHDMED   1    -0.2355   0.1718     1.8804     0.1703   -0.062241   0.790
PHDPREST 1     0.0270   0.0931     0.0841     0.7719    0.014524   1.027
ARTICLES 1     0.0734   0.0181    16.3590     0.0001    0.242278   1.076
CITATION 1    0.00013  0.00131     0.0098     0.9212    0.005878   1.000
PRESTIGE 1    -0.2569   0.1139     5.0888     0.0241   -0.111086   0.773
YEAR     1     2.0811   0.2337    79.2863     0.0001    2.583886   8.013
YEAR2    1    -0.1585   0.0203    60.9980     0.0001   -1.879202   0.853

Output 7.4 shows the results. Not surprisingly, there is a strong effect of number of years as an assistant professor. The odds of a promotion increases rapidly with time, but at a decreasing rate; there is actually some evidence of a reversal after seven years. I also estimated a model with a set of 10 dummy variables for years as an assistant professor, but a likelihood-ratio chi-square test showed no significant difference between that model and the more restricted version shown here. Higher-order polynomials also failed to produce a significant improvement. On the other hand, the chi-square test for comparing the model in Output 7.4 and a model that excluded both YEAR and YEAR2 was 178.98 with 2 d.f.

Article counts had the next largest effect (as measured by the Wald chi-square test): each additional published article is associated with an increase of 7.6 percent in the odds of a promotion. But there is no evidence of any effect of citations, suggesting that it’s the quantity of publications that matters in promotions, not the importance or impact of the published work.

Somewhat surprisingly, while there is no effect of the prestige of the institution where a biochemist got his doctorate, there is a substantial effect of the selectivity of his undergraduate institution. Each 1-point increase on the 7-point selectivity scale is associated with a 21-percent increase in the odds of a promotion, controlling for other covariates. There is also a slightly negative effect of the prestige of the current employer, suggesting that it may be harder to get promoted at a more prestigious department.

Notice that once the person-year data set is created, the time-dependent covariates are treated just like fixed covariates. Thus, many models can be estimated with the saved data set without additional data manipulations, making it especially convenient to estimate models with large numbers of time dependent covariates. By contrast, in PROC PHREG the time-dependent covariates must be recalculated whenever a new model is estimated.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.211.66