It has been a while since I have posted here. I have been working on some other writing projects, posting on my microblog, and reading. I have decided I want to use this site for longer and more serious posts and the microblog for identifying resources I come across and for just playing around.
I wanted to finish the book I have been reading before writing something for this site. The book, Fertilizers, pills, and magnetic strips: The fate of public education in America (G. Glass), has a strange title, but a very interesting concept. There are many components to this book, but a core idea is that the reform agenda focused on supposedly underperforming public education is far less about offering quality education to all students and more about the political struggle to reduce the cost of education for most while securing opportunities for the wealthy. The title is weird (I may make the effort to explain later), but there are a couple of useful components of this book – a discussion of whether the data used to evaluate general achievement in public schools really demonstrates the fading of US schools in comparison to international competition and an explanation of how the present constellation of reform strategies (home schooling, vouchers, charter schools, high stakes testing, voucher systems, and tuition tax credits) are really mostly about discrediting public education so that public investment (taxes) can be reduced and those with greater wealth can focus their resources on themselves and their children rather than “other people’s children”.
What about the test data? I have read a book on standardized tests and the perspective that is perpetuated based on shoddy analyses of these data before (Berliner and Biddle – The manufactured crisis). Glass mentions their work and comments on how it has largely been ignored. You have to understand that Berliner, Biddle, and Glass are some of the real heavy weights in the evaluation and applied statistics field. Glass is usually credited with “meta-analysis” – the statistical procedure used to combine the outcomes from multiple research studies in a way that allows some general statement about the effect of the treatments involved. If you come from the position that one should reach conclusions based on sound methodology and data, these folks are legit.
Anyway, Glass concludes that the data from various tests (SAT, NAEP, TIMMS) used to argue that US schools are failing miserably really do not warrant such conclusions. Among some of the issues I highlighted:
The SAT is really not a test designed to evaluate K-12 achievement. It is an test designed to assess aptitude for college performance (hence the name scholastic aptitude test). It was specifically designed to predict performance in some specific areas and not appropriate to test a wide range of skills and knowledge.
The TIMMS is often used as the basis for international comparisons. This is actually challenging and is based on important assumptions – groups are equivalent, content receives equal emphasis, content knowledge and skills are understood in the same way across nations. As examples of some imortant issues, Gage points out that the science and math components rely on the metric system (still not the common measurement system employed in this country), the US was among 4 countries that did NOT allow the use of calculators, and education systems are not equivalent in terms of basic characteristics such as the age of high school students (e.g., Icelandic test takers who happened to score above the US in math averaged 21.2 years of age – roughly equivalent to a US college junior).
Gage suggests that the most appropriate use of exam scores for states or nations given some of these measurement challenges is to determine whether scores are going up or down. Contrary to what one might assume from reading the local papers, scores in the U.S. are increasing.
I do not pretend to be a measurement expert – the issues involved in exploring the complex datasets used for such analysis are not the type of thing I play with when I sit down to run some regressions or a MANOVA in SPSS. I do think it is fair to suggest that the arguments Glass makes are not the message you have probably encountered from the usual sources. If there are counter arguments, I would like to hear the debate.