Tutoring and the Big Picture

The period since the end of the COVID lockdown has been difficult for K-12 schools. Students have struggled to attain traditional rates of academic progress. Student absenteeism has reached alarming levels. Where we live some districts have struggled to find teachers to fill vacant positions and other districts are now cutting staff because of massive budget overruns. Educators feel a lack of support from politicians and groups of parents complaining about various issues and sometimes pulling their children out of one school to enroll them in another. 

I certainly have no answer to this collection of challenges as it seems a proposed solution to one problem only leads to complaints due to consequences from that recommendation for a different issue. I look to educational research for potential solutions and there are certainly strategies with proven positive consequences in specific areas, but this is as far as it seems practical to go.

Tutoring has been one of the recent areas I have been investigating. My specific interest has been in the potential of AI as a substitute for human tutors, but the ample research on the effectiveness of human tutors has also been a fruitful ground for considering what proven experiences and roles AI might be applied to approximate. I have continued to review what we know about the effectiveness of human tutors and what it is tutors provide that students do not receive from their traditional classroom experiences.

For those wanting to review this research themselves, I recommend a fairly recent working paper by Nickow and colleagues (2020). This paper is both a general descriptive review of how tutoring is practiced and a meta-analysis of a large number of quality studies (96) of the effectiveness of this addition to K12 instruction. The meta-analysis considers only the impact of adult tutors, but the descriptive review does include some comments on both peer tutoring and technology-based tutoring. The technology comments were written before recent advances in AI. My summary of this paper follows.

Key Findings

I will begin with the overall conclusion. While there are more and less effective implementations and situations, adult tutoring consistently results in positive outcomes. The authors concluded that tutoring was responsible for substantial and consistent positive effects on learning outcomes, with an overall effect size of 0.37 standard deviations. The results are the strongest for teacher-led and paraprofessional tutoring programs, with diminished but still positive effects for nonprofessional and parent-led tutoring. Notably, tutoring is most effective in the earlier grades and shows comparable impacts across both reading and mathematics interventions, although reading tends to benefit more in earlier grades and math in later ones. A large proportion of the studies and tutoring activity appears focused in the first couple of years of school and in the subject area of reading. There are also far more studies of tutoring in the early grades than in secondary school. Only 7% of the studies included students above 6th grade.

Categories of tutors

The study by Nickow, Oreopoulos, and Quan (2020) identifies four categories of tutors:

  1. Teachers: These are certified professionals who have undergone formal training in education. They are typically employed by schools and likely have a deeper understanding of the curriculum, teaching strategies, and student assessment.
  2. Paraprofessionals: These individuals are not certified teachers, but they often work in educational settings. Paraprofessionals may include teaching assistants, educational aides, or other support staff in schools. They often work under the supervision of certified teachers.
  3. Nonprofessionals: This category typically includes volunteers who may not have formal training in education. They could be community members, older students, or others who offer tutoring services. Their level of expertise can vary widely.
  4. Parents: Parents often play a tutoring role, especially in the early years of a child’s education. Parental involvement in a child’s education can have a significant impact, but the level of expertise and the ability to provide effective tutoring can vary greatly among parents.

The study found that paraprofessionals are the largest provider of tutoring services, followed by nonprofessionals. The effectiveness of tutoring was found to be higher when provided by teachers and paraprofessionals compared to nonprofessionals and parents. Tutors who have teaching certificates and classroom experience were rare, but most effective in the research studies. As mentioned in the introduction to this post, the question of how schools must apply their budgets appears highly influential in whether known sources of impact can be applied.

Mechanisms of Tutoring Effectiveness

This paper reviewed mechanisms potentially responsible for the effectiveness of tutoring. While these hypothesized mechanisms were not evaluated in the meta-analysis, I find such lists useful in considering other approaches such as AI. Which of these mechanisms could AI tutoring address and which would not be possible?

The paper discusses several mechanisms through which tutoring enhances student learning:

  1. Increased Instruction Time: Tutoring provides additional instructional time, crucial for students who are behind.
  2. Customization of Learning: The one-on-one or small group settings of tutoring allow for tailored instruction that addresses the individual needs of students, a significant advantage over standard classroom settings where diverse learning needs can be challenging to meet. The authors propose that tutoring might be thought of as an extreme manipulation of class size. I know that studies of class size have produced inconsistent results extremely small class sizes are not investigated in such research.
  3. Enhanced Engagement and Feedback: Tutoring facilitates a more interactive learning environment where students receive immediate feedback, enhancing learning efficiency and correction of misunderstandings in real time.
  4. Relationship Building: The mentorship that develops between tutors and students can motivate students, fostering a positive attitude toward learning and educational engagement.

Program Characteristics

The document evaluates the variation in the effectiveness of tutoring programs based on several characteristics:

  1. Tutor Type: Teacher and paraprofessional tutors are found to be more effective than nonprofessionals and parents, suggesting that the training and expertise of the tutor play a critical role in the success of tutoring interventions.
  2. Intervention Context: Tutoring programs integrated within the school day are generally more effective than those conducted after school, possibly due to higher levels of student engagement and less fatigue.
  3. Grade Level: There is a trend of diminishing returns as students age, with the most substantial effects seen in the earliest grades. This finding emphasizes the importance of early intervention.

Summary

This research and analysis paper might be summarized by proposing that schools cannot go wrong with tutoring. The paper suggests several policy implications based on its findings. Given the strong evidence supporting the effectiveness of tutoring, especially when conducted by trained professionals, the authors advocate for increased investment in such programs. They suggest that education policymakers and practitioners should consider tutoring as a key strategy for improving student outcomes, particularly in early grades and for subjects like reading and mathematics

Reference

Nickow, A., Oreopoulos, P., & Quan, V. (2020). The impressive effects of tutoring on prek-12 learning: A systematic review and meta-analysis of the experimental evidence. https://www.nber.org/papers/w27476.

Loading

AI Tutoring Update

This is an erratum in case my previous posts have misled anyone. I looked up the word erratum just to make certain I was using the word in the correct way. I have written several posts about AI tutoring and in these posts, I made reference to the effectiveness of human tutoring. I tend to provide citations when research articles are the basis for what I say and I know I have cited several sources for comments I made about the potential of AI tutors. I have not claimed that AI tutoring is the equal of human tutoring, but suggested that it was better than no tutoring at all, and in so doing I have claimed that human tutoring was of great value, but just too expensive for wide application. My concern is that I have proposed that the effectiveness of human tutoring was greater than it has been actually shown to be.

The reason I am bothering to write this post is that I have recently read several posts proposing that the public (i.e., pretty much anyone who does not follow the ongoing research on tutoring) has an inflated understanding of the impact of human tutoring (Education Next, Hippel). These authors propose that too many remember Bloom’s premise of a two-sigma challenge and fail to adjust Bloom’s proposal that tutoring has this high level of impact on student learning to what the empirical studies actually demonstrate. Of greater concern according to these writers is that nonreseachers including educational practitioners, but also those donating heavily to new efforts in education continue to proclaim tutoring has this potential. Included in this collection of wealthy investors and influencers would be folks like Sal Kahn and Bill Gates. I assume they might also include me in this group while I obviously have little impact in compared to those with big names. To be clear, the interest of Kahn, Gates, and me is really in AI rather than human tutoring, but we have made reference to Bloom’s optimistic comments. We have not claimed that AI tutoring was as good as human tutors, but by referencing Bloom’s claims we may have led to false expectations. 

When I encountered these concerns, I turned to my own notes from the research studies I had read to determine if I was aware that Bloom’s claims were likely overly optimistic. It turns out that I had read clear indications identifying what the recent posters were concerned about. For example, I highlighted the following in a review by Kulik and Fletcher (2016). 

“Bloom’s two sigma claim is that adding undergraduate tutors to a mastery program can raise test scores an additional 0.8 standard deviations, yielding a total improvement of 2.0 standard deviations.”

My exposure to Bloom’s comments on tutoring originally had nothing to do with technology or AI tutoring. I was interested in mastery learning as a way to adjust for differences in the rate of student learning. The connection with tutoring at the time Bloom offered his two-sigma challenge was that mastery methods offered a way to approach the benefits of the one-to-one attention and personalization provided by a human tutor. Some of my comments on mastery instruction and the potential of technology for making such tactics practical are among my earlier posts to this site. Part of Bloom’s claim being misapplied is based on his combination of personalized instruction via mastery tactics with tutoring. He was also focused on college-aged students in the data he cited. My perspective reading the original paper many years ago was not “see how great tutoring is”. It was more tutoring on top of classroom instruction is about is good as it is going to get and mastery learning offers a practical tactic that is a reasonable alternative.

As a rejoinder to update what I may have claimed, here are some additional findings from the Kulik and Fletcher meta-analysis (intelligent software tutoring).

The studies reviewed by these authors show lower benefits for tutoring when outcomes are measured on standardized rather than local tests, sample size is large, participants are at lower grade levels, the subject taught is math, a multiple-choice test is used to measure outcomes, and Cognitive Tutor is the ITS used in the evaluation.

However, on a more optimistic note, the meta-analysis conducted by these scholars found that in 50 evaluations intelligent tutoring systems led to an improvement in test scores of 0.66 standard deviations over conventional levels. 

The two sources urging a less optimistic perspective point to a National Board of Educators Research study (Nickow and Colleagues, 2020) indicating that human tutoring for K-12 learners was approximately .35 sigma. This is valuable, but not close to the 2.0 level.

Summary

I have offered this update to clarify what might be interpreted based on my previous posts, but also to provide some other citations for those who now feel the need to read more original literature. I have no idea whether Kahn, Gates, etc. have read the research that would likely indicate their interest in AI tutoring and mastery learning was overly ambitious. Just to be clear I had originally interpreted the interest of what the tech-types were promoting as mastery learning (personalization) which was later morphed into a combination with AI tutoring. This combination was what Bloom was actually evaluating. The impact of a two-sigma claim when translated into what such an improvement would actually mean in terms of rate of learning or change in a metric such as assigned grade seems improbable. Two standard deviations would move an average student (50 percentile) to the 98th percentile. This only happens in Garrison Keeler’s Lake Wobegon. 

References:

Bloom, B. S. (1984). The 2 sigma problem: The search for methods of group instruction as effective as one-to-one tutoring. Educational researcher, 13(6), 4-16.

Kulik, J. A., & Fletcher, J. D. (2016). Effectiveness of intelligent tutoring systems: a meta-analytic review. Review of educational research, 86(1), 42-78.

Nickow, A., Oreopoulos, P., & Quan, V. (2020). The impressive effects of tutoring on prek-12 learning: A systematic review and meta-analysis of the experimental evidence. https://www.nber.org/papers/w27476

Loading