File this under “this should rile things up” and “I am willing to present both views”.
The NYTimes offers a summary of recent data on the approached advocated by controversial administrator Michelle Rhee while attempting to address problems of the D.C. school system. The following passage summaries the findings:
“High-powered incentives linked to multiple indicators of teacher performance can substantially improve the measured performance of the teaching work force,” conclude the researchers, Thomas Dee of Stanford University’s Graduate School of Education and James Wyckoff of the Curry School of Education at the University of Virginia. Evaluation programs, they add, can bring “substantive and long-term educational and economic benefits” both by “avoiding the career-long retention of the lowest-performing teachers and through broad increases in teacher performance.”
The study has not yet undergone peer review. It is being published as a working paper by the National Bureau of Economic Research, a Cambridge, Mass., group run by some of the country’s top academic economists.
I happen to be reading “Reign of Error” at this time. The policies of Michelle Rhee – specifically the use of testing results to evaluate teachers – are strongly criticized in this book. It will be interesting to see how the newest findings are critiqued.
Note the present study by economists Dee & Wykoff operationalized teacher performance using structured observational measures in addition to student performance.
My guess after reading the description is that the study will be criticized for generating data by comparing teachers near the extremes rather than determining the magnitude of the continuous relationship between the predictor and dependent variables. This is an unusual approach comparing those near, but not at the cutoffs. I guess the logic is that these individuals would be most sensitive to reinforcement and punishment. There is an interpretive challenge here – all teachers are influenced by conditions of employment and an analysis system focused on what might be described as the most sensitive may have very different consequences for the rest of the group. What does it mean if studies evaluating the overall consequences (the most common existing research approach) showed far weaker consequences? The fact is that the evaluation model is applied to everyone and how do you reconcile the outcomes of the studies based on the entire sample with a study based on a selected subset? I am also guessing that the interpretation that “under performing and lower rated teachers leave the field” is open to challenge perhaps because those in situations with under performing students are working in the most challenging and frustrating settings. I wonder, for example, if known correlates of student performance such as SES are first statistically removed, whether the observed relationship at the extremes holds. I could not tell from the abstract of the unpublished and unreviewed study.