The U.S. and China AI Competition

The very recent summit involving Presidents Trump and XI Jinping dealt with many political controversies of the day, which included AI and related issues such as intellectual property. The mention of AI brought to mind a book by Kai-Fu Lee, which I think I read in 2019. I remembered some of the comments Lee made about China, computer science, and AI at that time. Lee, who has held both U.S. and Taiwanese citizenship, wrote that China would have important advantages in the development and application of technology, which surprised me at the time but made some sense given what I knew about China. Lee was educated in the U.S. (Carnegie Mellon Ph.D.), worked for Apple, then returned to Taiwan and later worked for Google in China. I explored my notes and highlights from that book and also from The Big Nine. My interest in the role of AI in education and its application across different countries led me to another article in my personal archive (Hao, 2019). The following comments are mostly based on Lee’s ideas, with some expansion using the other two references I have mentioned. All sources are a bit dated, given the rapid pace of AI developments, but I still find the core ideas worth considering. 

According to Lee, China’s advantages in AI come from scale, data, industrial capacity, talent, and state coordination.

Scale equals more data

China’s 1.4 billion people give it control of “the largest, and possibly most important, natural resource in the era of AI: human data”—and that its huge number of internet users gives it both data quantity and quality for training models. This resource is roughly the equivalent of the combined resources of the United States and Europe. Lee offered this perspective some years ago when finding content seemed more a priority for U.S. companies who encountered push back when scrapping the web and books without permission. 

Industry integration

Chinese companies share. For example, Tencent’s ecosystem is noted as perhaps the single richest data ecosystem of all the giants and combines multiple services, say, in contrast to X and Amazon. Concentration of data and services in a few massive platforms offers a related quantity and quality advantage.

Quantity of Talent

There is a Thomas Friedman quote I have always remembered. “Remember in China if you are a one in a million talent, there are 1400 others just like you.” Lee offers a different assessment of the talent situation specific to AI. He claims that the U.S. has more superstars, but China has the advantage in the number of engineers and computer scientists working in on AI and related fields. Aside of the great difference in population, engineering, programming and science are simply fields of advanced study that are seen as more of an opportunity in China. My own way of thinking about this difference is that in the U.S., business and finance attract many and in China these fields are less of a draw. 

State Coordination and Standards

A “big advantage for China: it doesn’t have the privacy and security restrictions that might hinder progress in the United States”. The commitment to the massive surveillance of its own population is known focus of the Chinese government and a means of control and manipulation of its population. We rightfully consider the use of technology to probe the personal lives and values a violation of basic human rights and bristle internally at the collection of information about us by companies and the government. Simply put, China doesn’t have the privacy and security restrictions that might hinder progress in the United States. Despite tolerated abuses, the commitment to collecting and analyzing this type of information is a source of funding and a focus of experimentation in China. 

” Move fast and break things” was the original Google creed, but a value system that has come under increasing criticism in China. Without the pressure to curb potential negative aspects of AI, China moves faster. Related to this is the greater top down decision making of the Chinese system. In the U.S., you have multiple businesses trying to raise huge sums of money and are often isolated from each other, often duplicating similar approaches. We historically value competition and assume the motivation has advantages. While true, I wonder about the “business model” sucking up a large share of the available investment money in this sector in the US. The amount of money required has to a great degree squeezed out university researchers who either leave universities or work around the edges of AI innovation. While AI research is a high priority in China, the U.S. has cut funding for NSF funding for AI and cybersecurity. 

AI in China and Education

The personal interest that has driven my own interest in AI has been potential opportunities in education. This has been a messy issue in this country with pushback due to legitimate concerns for cheating, failure to address skill development, and lack of interest in instruction presented by a computer. China has committed to exploring AI-facilitated education. 

Academic competition in China is tense. Millions of students a year take the college entrance exam, the gaokao. Your score determines whether and where you can study for a degree, and it’s seen as the biggest determinant of success for the rest of your life. Parents willingly pay for tutoring or anything else that helps their children get ahead. The options tech can provide outside of classrooms offer opportunities to sell experiences to well-meaning parents. (Hao)

Two companies that are likely unfamiliar to most U.S. educators,  Squirrel AI and Alo7, make good example. Since the Hao article was published both services became available in the U.S.  

Squirrel AI uses an “adaptive learning” model that breaks subjects into thousands of “knowledge points”—far more granular than traditional textbooks. The system diagnoses a student’s specific gaps and provides targeted video lectures and practice problems. The teachers are intended to act like “pilots,” stepping in only for emotional support or complex issues while the algorithm handles the core instruction. Educators will likely recognize similarities to the Kahn Academy

In contrast, Alo7 emphasizes a “quality-oriented education” focusing on creativity and the liberal arts. This “intelligent classroom” use AI to analyze student engagement, pronunciation, and even “joy” through facial and vocal recognition 

The interest in AI in education seems to be a combination of the emphasis of standardized test performance for advancement and opportunity, the larger population, and the greater risk tolerance within the context of exploration for improvement. 

Summary

This post is not a value judgment comparing U.S. AI policies, but rather an attempt to summarize what some experts have said about the differences. My personal issue concerns the economic pressure in the U.S. based in our trust in competition among corporations to drive innovation. While this is an approach that has worked in many areas, the huge investments that are required have to this point sucked a great deal of capital from the economy and seem largely and unnecessarily redundant. I personally also find the focus of interest in AI in education (personalized and adaptive instruction) interesting as this emphasis has appealed to me based on my interest in mastery learning

Sources

Hao, K. (2019). China has started a grand experiment in AI education. It could reshape how the world learns. MIT Technology Review, 123(1), 1-9.

Lee, Kai Fu. 2018). AI Superpowers: China, Silicon Valley, and the New World Order. Boston, Mass: Houghton Mifflin.

Webb, A. (2019). The big nine: How the tech titans and their thinking machines could warp humanity. PublicAffairs.

Loading

Ignoring The Instruction Option Of EdTech

When I first began writing professionally about K-12 use of technology in the mid-1990s, a popular approach was to organize content around the tutor, tool, tutee model. This model proposed that technology in the hands of students could deliver instruction (tutor), facilitate the activities of being a student (tool), and program/code (tutee). While AI now blurs the lines between these roles, this simple organizational scheme still seems useful. 

This post was prompted by what I sense to be dissatisfaction with the instructional component of this model and a recent paper entitled the “5% problem. This paper challenged the positive benefits of commercial instructional offerings (e.g., Kahn Academy, CK-12) as misrepresenting what the data on achievement they have collected demonstrate. Ignore my descriptor of such programs as commercial when I know you can use at least many of the features of such offerings at no cost. How these efforts are funded is a different issue. The relevance of “5%” lies in the hidden expectation that only those who use the learning system as intended are included in the analyzed data.  Some studies reporting high effectiveness are based on 5% of those provided access and this important factor is not highlighted in the reporting of results. 

Such assertions make me uncomfortable. Despite what to me seems a backlash against screen time, cautions related to AI allowing learners to offload the experiences intended by learning tasks, and concerns classroom circumstances associated with technology have caused educators to limit meaningful social contact with students and students with each other, now I am feeling I must question the studies I have explored on the benefits of AI tutoring and the personalization of the rate of progress through instructional materials allowed by computer supported instruction (e.g., Kulik & Fletcher). 

Teacher Commitment

As I have considered this recent challenge, it has occurred to me that I have encountered a variant of it throughout my career.  In 2019, I wrote a blog post titled “There is a reason teachers don’t use the software provided by their districts.”  At the time, this issue caught my attention because my wife and I were serving on an advisory group for our local school district and the tech director reported on a monitoring software used to track the use of software the district had purchased to make decisions about which license access packages could be dropped so funds could be reallocated to other requests. I noticed some researchers were using what seemed like a similar system to examine the use of instructional technology and to consider why it was underutilized. These scholars reached a conclusion nearly identical to that of the more recent, in-depth examination of online instructional tools. “One of the other primary findings of this report is that usage of apps is generally lower than might be expected. Most apps are used only for a limited time, and most purchased by districts go unused. This has an impact on efficacy – an app cannot be effective if it is not used” (p.25). 

At that time, it seemed the issue was explaining teacher commitment. Thomas Arnett has weighed in on the issue of school-funded software being seriously underutilized, speculating, based on his Jobs to be Done Theory, that educators simply don’t perceive that the software they have access to helps them satisfy the jobs they perceive as expected of them, relative to more traditional approaches. These jobs are described as 1) Help me lead the way in improving my school, 2) Help me find practical ways to engage and challenge more students, and 3) Help me replace a broken instructional model so I can help each student. From my perspective, many technology-based instruction systems seem purposefully designed to address individual learning speeds and existing knowledge, but perhaps this is how these resources by educators. In a more detailed version of this only online description, these authors propose that educators might respond if a greater effort were made to engage educators with data and anecdotal accounts of the success of peer educators. 

What about the learners? 

As I explored this history and what seems a frustrating pattern for those of us who have been influenced by the seeming promise of personalized progress systems and intelligent tutoring systems in a carefully controlled context, when turned loose in the complexity of schools and classrooms. The challenge of matching key elements of the controlled setting in which concepts are developed in applied settings is termed fidelity and is an issue in many fields (e.g., Trustschel and colleagues). I have struggled with this challenge in my own research, which has often focused on creating technology-facilitated study environments for college students enrolled in large introductory classes. 

Cognitive research has accumulated a massive amount of evidence demonstrating the effectiveness of retrieval practice and the challenge that less capable learners are often much less aware of their specific knowledge gaps and a false sense of understanding (i.e., metacomprehension). In other words, less capable learners often don’t know what they don’t know and thus are very inefficient at remediating their problem areas. One way to provide retrieval practice and address poor metacomprehension is to provide practice tests. More sophisticated applications that make use of technology can also track weak areas so that these areas can be emphasized, link the student to remedial content when individual elements of information are not known or misunderstood, and even request students to predict the accuracy of their performance in an effort to increase awareness of strengths and weaknesses. 

If you are interested in the details of this study, I have provided a citation below. The relevance of this study for the present post concerns the willingness of learners, college students in this case, to take advantage of a resource designed to improve their performance. The following graph is an easy way for me to make my point. Learners were divided into three groups based on course performance. For each of the three exams, the percentage of learners in each performance group who satisfy the stated goal of the study task, use but do not meet this standard, or do not use the study task is identified. There is a clear pattern: those performing the worst do not meet the study goal. Most persuasively in keeping with the other data reported in this post is the data on those who made no effort to use the system. It is possible trying but failing to reach the stated standard is related to understanding or aptitude, but failing to try, which should still be beneficial, is not.  

As was the case in the 5% paper, those less in need of assistance participated more in a likely beneficial activity. In fairness, the “perceived suitability” of a learning opportunity proposed while vague offers a second possible explanation. 

Summary

In this post, I consider the persistent “underutilization gap” in educational technology, where instructional tools—from commercial platforms to AI tutors—frequently fail to achieve their promised impact because they are either ignored by teachers or avoided by the students who need them most. It is true that the “5% problem highlights how efficacy data is often skewed by only including the small fraction of users who follow the system as intended, while struggling learners consistently participate the least in these personalized systems. Ultimately, I suggest that EdTech’s potential for personalized progress remains stalled by a lack of “fidelity” in real-world settings and a failure to align software with the practical “jobs” educators and students actually prioritize.

Citations:

Grabe, M., & Flannery, K. (2010). A Preliminary Exploration of Online Study Question Performance and Response Certitude as Predictors of Future Examination Performance. Journal of Educational Technology Systems, 38(4), 457-472.

Kulik, J. A., & Fletcher, J. D. (2016). Effectiveness of intelligent tutoring systems: a meta-analytic review. Review of educational research, 86(1), 42-78.

Trutschel, D., Blatter, C., Simon, M. et al. (2023). The unrecognized role of fidelity in effectiveness-implementation hybrid trials: simulation study and guidance for implementation researchers. BMC Medical Research Methodology, 23, 116. https://doi.org/10.1186/s12874-023-01943-3

Loading

AI, tutoring, and mastery learning

AI, tutoring, and mastery learning are topics that have dominated my professional interests for years. Obviously, AI has been added recently, but the other topics have been active topics of my scholarship since the late 1960s. I have mostly treated these topics in isolation, but they can be interrelated and recent efforts have drawn attention to potential interconnections. I will end this post by providing my own take on how these topics now can be considered in combination.

Aptitude and mastery learning

I think the history of the interrelationship of these two concepts is important and not appreciated by current researchers and educational innovators. At least I do not see an effort to connect with what I think are important insights.

Aptitude and how educational experiences accommodate differences in aptitude don’t seem to receive a lot of attention. I see a lot of references to individual interests and perhaps existing knowledge under the heading of personalization, but less to aptitude. A common way of defining aptitude is as the natural ability to do something. When applied to learning, this definition becomes controversial. It may be the word “natural”. The idea of “natural” as biologically based is probably what causes the problems. You can kind of see what I mean by a messy idea if the word intelligence is equated with “natural”. Immediately those who disagree with the basic idea begin complaining about the limitations of intelligence tests and and the dangers of attaching labels to individuals. I can understand the concerns and potential abuses, but I have never thought the solution was to ignore what any educator faces in the variability in the students they work with. What way of thinking about this variability would be helpful?

As I was taught about intelligence and intelligence testing and learned about the correlates with academic achievement, I encountered a way of thinking I found helpful and useful. Perhaps aptitude could be thought of as speed of learning. This was the proposal of John Carroll. Instead of aptitude predicting differences in how much would be learned, Carroll proposed that differences in aptitude predicted how long most learners would take to grasp the same learning objectives. The implications of this perspective seem extremely important. Carroll argued that traditional educational settings, with their fixed timeframes for learning, often disadvantaged students with lower aptitude. In these settings, students with higher aptitude tend to learn more within the limited time available, while those who require more time to process information might fall behind. This disparity has an important secondary implication. Learning is cumulative with existing knowledge influencing future understanding and learning efficiency. Put another way teachers will likely recognize, missing prerequisites make related information difficult to understand. So, it is not just differences in aptitude that matter, but differences in aptitude. within a fixed learning environment (time and methods) that compounds learning speed and existing knowledge.

The connection with a traditional way of defining aptitude such as intelligence may not jump out at you, but consider the classic IQ=MA/CA so many learned in Intro to Psych courses. Think of it this way. CA is chronological age or within my way of explaining aptitude the time available for learning. MA is mental age or how much an individual of a given age knows. The quotient ends up as a way of learning efficiency or amount learned per unit of time.

Anyway, assuming this theoretical notion offers a reasonable way of understanding reality, what does this mean for educators and what actions seem reasonable responses?

Dealing with differences in rate of learning

When I present this to future teachers, I propose that educational settings do make some accommodations to this perspective. At the extreme, a few students might be required to repeat a grade. Schools provide extra help and time in the form of pull-out programs and other types of individual help. Schools used to and to some extent still group students based on performance/ability to match the rate of progress to instruction (e.g., tracking, ability grouping). While helpful, these programs do not stem the increasing variability in performance across elementary school grades. Perhaps once students get to high school variability is accommodated by the selection of different courses and the pursuit of different learning goals, but even if this is the case there are long-term consequences from the early learning experiences. How is motivation impacted by the increasing frequency of failure and related frustration? Are there practical ways to claw your way back from early failures once you fall behind?

Mastery learning

In the early 1970s, I became interested in two instructional strategies labeled mastery learning. These approaches proposed ways to respond to variability in the rate of learning. I will summarize these as Bloom’s Group-Based Method and Keller’s Personalized System of Instruction. Bloom was and continues to be a big name in education and gets a lot of attention. I see Keller as developing a system more attuned to the application of technology and AI. Both offer concrete proposals and encouraged a lot of research. The volume of research and related meta-analyses offer much to present efforts that lack the same detailed analyses (see references to the work of Kulik and colleagues).

Bloom’s Group-Based Mastery?—?In the late 1960s, Benjamin Bloom, an educational psychologist, considered the optimal approach to individualized education. Bloom concluded that individual tutoring yielded the best results for learners. Bloom’s research indicated that tutoring could produce significant improvements in student achievement, with 80% of tutored students achieving a level of mastery only attained by 20% of students in traditional classroom settings. Bloom recognized the impracticality of providing one-on-one tutoring for every student. Instead, he challenged researchers to explore alternative instructional strategies capable of replicating the effectiveness of individual tutoring. This has been described as the 2-sigma challenge based on the statistical advantage Bloom claimed for tutoring.

Bloom’s (1968} approach to mastery learning was group-based. A group of learners would focus on content (e.g. chapter) to be learned for approximately a week and would then be administered a formative evaluation. Those who passed this evaluation (often at what was considered a B level) would continue to supplemental learning activities and those who did not pass would receive remediation appropriate to their needs. At the end of this second period of instruction (at about the two-week mark), students would receive the summative examination to determine their grades. Those who were struggling were provided more time on the core goals. Yes, this is a practical more than a perfect approach as there is no guarantee that all students will have mastered the core objectives necessary for future learning by the end of the second week. A similar and more recent approach called the Modern Classroom Project categorizes goals as “need to know”, “good to know” and “aim to know”. The idea is that not all possible goals can practically be achieved.

Keller’s Personalized System of Instruction?—?Fred Keller, drawing inspiration from Carroll’s work, developed the Personalized System of Instruction (PSI) in 1968. Keller proposed that presenting educational content in written format, rather than through traditional lectures, could provide students with the flexibility to learn at their own pace. PSI utilizes written materials, tutors, and unit mastery to facilitate learning. Students progress through units of instruction at their own pace, revisiting concepts and seeking clarification as needed when initial evaluations show difficulties. This self-paced approach enables students to dedicate additional time to challenging concepts while progressing more quickly through familiar material. The focus on written materials that could be used by individuals allowed Keller’s approach to focus more on individual progress and it was not necessary that a group be kept to a common pace of progress.

PSI utilizes frequent assessments to gauge student understanding and identify areas requiring further instruction. These assessments are non-punitive, meaning they do not negatively impact a student’s grade. Instead, assessments provide feedback that guides students toward mastery of the material. If a student does not demonstrate mastery on an assessment, they receive additional support and instruction tailored to their specific needs, before retaking the assessment.

In Keller’s model, tutors play a crucial role in evaluating student progress, offering personalized feedback, and providing clarification or additional instruction when needed. The role of the tutor could be fulfilled by various individuals, including the teacher, teaching assistants, or even fellow students who have already achieved mastery ofthe subject matter. The teacher’s role in PSI shifts from delivering lectures to designing the curriculum, selecting and organizing study materials, and providing individualized support to students.

Adapting old models to modern technology

While mastery learning predates the widespread adoption of technology in education, technology has significantly enhanced its implementation. Meta-analyses generally found that mastery approaches offered achievement benefits when compared with traditional instruction. My interpretation of why interest in the original approaches declined was that interest waned not because of effectiveness, but because of practicality. Mastery approaches were simply difficult to implement. Online platforms and educational technologies can facilitate personalized learning experiences by delivering content, tracking student progress, and providing individualized feedback and support. Technology can also automate many of the administrative tasks associated with mastery learning, such as grading assessments and tracking student progress, freeing up educators to focus on providing individualized support. Both the Bloom and Keller approaches could be implemented making use of technology, but the greatest benefit would seem to be to the Keller approach.

AI, tutoring, and mastery learning

Recent mention of mastery learning (Kahn, Archambault and colleagues) do so in combination with tutoring. Bloom originally proposed that his mastery approach was his example that could be related to his two-sigma challenge. However, group-based mastery was compared to and not integrated with tutoring. The number of professionals working in schools is not increasing and if anything class sizes are increasing. Greater individualization only increases the importance of individual monitoring and attention and the AI as tutor can reduce some of the demands on the limited time of professional educators.

Archambault and colleagues summarize the complication posed by the seemingly conflicting education goals of individualized learning and the needs for interaction and socioemotional learning. I have included the following quote from their work.

For example, cultivating classroom community through building relationships online and having students work together to develop social interaction at a distance may have competing interests with personalizing instruction such that each student can work at their own pace and through their own path to master course content.

Summary

Mastery learning is gaining increasing attention among educators seeing the value of applications of technology to individualize learning. This post summarizes the history of mastery instructional methods and offers other insights into how old ideas may be practically implemented with technology.

I have written multiple posts about mastery learning and current efforts to apply mastery principles. Reviewing some of these posts may be valuable if this summary sparks your interest.

References:

Archambault, L., Leary, H., & Rice, K. (2022). Pillars of online pedagogy: A framework for teaching in online learning environments. Educational Psychologist, 57(3), 178–191. https://doi.org/10.1080/00461520.2022.2051513

Benjamin, S., Dhew, E., & Bloom, B. (1968). Learning for mastery. Eval. Comment, 1, 1–1

Kahn, S. (2024). Brave new words: How AI will revolutionize education and what that’s a good thing. Penguin Random House.

Keller, F. S. (1968). “Good-bye teacher”. Journal of Applied Behavior Analysis, 1, 79–89

Kulik, C., Kulik, J. & Bangert-Drowns, R.L. (1990). Effectiveness of mastery learning programs: A meta-analysis. Review of Educational Research, 60, 265–299.

Kulik, C., Kulik, J. & Bangert-Drowns, R.L. (1990). Is there better evidence on mastery learning? A response to Slavin. Review of Educational Research, 60, 303–307.

Kulik, J. A., Kulik, C. L. C., & Cohen, P. A. (1979). A meta-analysis of outcome studies of Keller’s personalized system of instruction. American Psychologist, 34(4), 307- 318

Loading

AI Tutoring Update

This is an erratum in case my previous posts have misled anyone. I looked up the word erratum just to make certain I was using the word in the correct way. I have written several posts about AI tutoring and in these posts, I made reference to the effectiveness of human tutoring. I tend to provide citations when research articles are the basis for what I say and I know I have cited several sources for comments I made about the potential of AI tutors. I have not claimed that AI tutoring is the equal of human tutoring, but suggested that it was better than no tutoring at all, and in so doing I have claimed that human tutoring was of great value, but just too expensive for wide application. My concern is that I have proposed that the effectiveness of human tutoring was greater than it has been actually shown to be.

The reason I am bothering to write this post is that I have recently read several posts proposing that the public (i.e., pretty much anyone who does not follow the ongoing research on tutoring) has an inflated understanding of the impact of human tutoring (Education Next, Hippel). These authors propose that too many remember Bloom’s premise of a two-sigma challenge and fail to adjust Bloom’s proposal that tutoring has this high level of impact on student learning to what the empirical studies actually demonstrate. Of greater concern according to these writers is that nonreseachers including educational practitioners, but also those donating heavily to new efforts in education continue to proclaim tutoring has this potential. Included in this collection of wealthy investors and influencers would be folks like Sal Kahn and Bill Gates. I assume they might also include me in this group while I obviously have little impact in compared to those with big names. To be clear, the interest of Kahn, Gates, and me is really in AI rather than human tutoring, but we have made reference to Bloom’s optimistic comments. We have not claimed that AI tutoring was as good as human tutors, but by referencing Bloom’s claims we may have led to false expectations. 

When I encountered these concerns, I turned to my own notes from the research studies I had read to determine if I was aware that Bloom’s claims were likely overly optimistic. It turns out that I had read clear indications identifying what the recent posters were concerned about. For example, I highlighted the following in a review by Kulik and Fletcher (2016). 

“Bloom’s two sigma claim is that adding undergraduate tutors to a mastery program can raise test scores an additional 0.8 standard deviations, yielding a total improvement of 2.0 standard deviations.”

My exposure to Bloom’s comments on tutoring originally had nothing to do with technology or AI tutoring. I was interested in mastery learning as a way to adjust for differences in the rate of student learning. The connection with tutoring at the time Bloom offered his two-sigma challenge was that mastery methods offered a way to approach the benefits of the one-to-one attention and personalization provided by a human tutor. Some of my comments on mastery instruction and the potential of technology for making such tactics practical are among my earlier posts to this site. Part of Bloom’s claim being misapplied is based on his combination of personalized instruction via mastery tactics with tutoring. He was also focused on college-aged students in the data he cited. My perspective reading the original paper many years ago was not “see how great tutoring is”. It was more tutoring on top of classroom instruction is about is good as it is going to get and mastery learning offers a practical tactic that is a reasonable alternative.

As a rejoinder to update what I may have claimed, here are some additional findings from the Kulik and Fletcher meta-analysis (intelligent software tutoring).

The studies reviewed by these authors show lower benefits for tutoring when outcomes are measured on standardized rather than local tests, sample size is large, participants are at lower grade levels, the subject taught is math, a multiple-choice test is used to measure outcomes, and Cognitive Tutor is the ITS used in the evaluation.

However, on a more optimistic note, the meta-analysis conducted by these scholars found that in 50 evaluations intelligent tutoring systems led to an improvement in test scores of 0.66 standard deviations over conventional levels. 

The two sources urging a less optimistic perspective point to a National Board of Educators Research study (Nickow and Colleagues, 2020) indicating that human tutoring for K-12 learners was approximately .35 sigma. This is valuable, but not close to the 2.0 level.

Summary

I have offered this update to clarify what might be interpreted based on my previous posts, but also to provide some other citations for those who now feel the need to read more original literature. I have no idea whether Kahn, Gates, etc. have read the research that would likely indicate their interest in AI tutoring and mastery learning was overly ambitious. Just to be clear I had originally interpreted the interest of what the tech-types were promoting as mastery learning (personalization) which was later morphed into a combination with AI tutoring. This combination was what Bloom was actually evaluating. The impact of a two-sigma claim when translated into what such an improvement would actually mean in terms of rate of learning or change in a metric such as assigned grade seems improbable. Two standard deviations would move an average student (50 percentile) to the 98th percentile. This only happens in Garrison Keeler’s Lake Wobegon. 

References:

Bloom, B. S. (1984). The 2 sigma problem: The search for methods of group instruction as effective as one-to-one tutoring. Educational researcher, 13(6), 4-16.

Kulik, J. A., & Fletcher, J. D. (2016). Effectiveness of intelligent tutoring systems: a meta-analytic review. Review of educational research, 86(1), 42-78.

Nickow, A., Oreopoulos, P., & Quan, V. (2020). The impressive effects of tutoring on prek-12 learning: A systematic review and meta-analysis of the experimental evidence. https://www.nber.org/papers/w27476

Loading

Is there such a thing as aptitude?

The concept of aptitude and how differences in aptitude influencing learning could be reduced through mastery strategies have interested me throughout my academic career. I understood aptitude as something I thought of as intelligence. Intelligence is an abstraction that researchers attempt to measure with intelligence tests and investigate in practice through correlations with academic progress. Intelligence tests are not a direct measure of aptitude, but really an estimate based on differences in what individuals have learned and can do. Even the simple representation of intelligence as IQ (intelligence quotient) imagines intelligence as how much has been learned (mental age or MA) divided by age (chronological age).

Intelligence tests have come under a great deal of criticism based on potential racial/SES biases. These criticisms are certainly fair, but the tests do predict academic achievement and I was never convinced to support the abandonment of the development and use of such tests. The correlations measure something, and whatever this is does not disappear when tests are not given. If both tests and educational practice are biased, why not recognize that this is the case?

The theoretical basis for mastery learning (see Arlin and Bloom references) proposes that educators consider the rate of learning and accept that the rate of learning differs greatly among individuals. To me, this sounded very much like intelligence, and the concept of IQ is obviously related to learning rate (how much was learned per unit of time). However, what these researchers and educational theorists proposed was that other factors were involved in traditional educational practice and these other factors had a significant impact on achievement. While time required for learning was determined by aptitude, it was also influenced by whether the method of instruction met individual needs and by differences in existing knowledge. Think of it this way. If aptitude-based differences in learning create a range of learning speeds and a class of students moves through learning experiences faster than some students can master some important skills and concepts, in the future some students will be burdened not only by learning at a slower rate, but also by missing knowledge prerequisite to new skills and concepts they are trying to learn. Over time, these missing elements (Sal Kahn calls this Swiss cheese learning) will accumulate increasing failure and frustration in some learners. Mastery learning strategies focus on limiting the accumulation of knowledge prerequisites by individualizing the rate of learning to the rate of mastery. Some students in completely individualized approaches do move more slowly (and some faster), but the theory proposes that the rate of actual mastery would be faster than without mastery for all learners because deficits would not accumulate in learners needing more time and more capable students could move more quickly. The work of Arlin attempted to demonstrate what these changes in the rate of learning might be. When ratios such as 5:1 or 7:1 are proposed, it is easy to see why some students would fall hopelessly behind.

Individualization is challenging. Tutoring has always been a personal interest, but not economically feasible. With access to personal computers in the 1990s I saw the first method that might be available to provide individualization and this continues as an interest. Many attack present attempts to make use of technology in direct instruction as boring and depersonalizing. I think these folks have the wrong idea, but this is a topic I address elsewhere. Here, I want to recognize recent research that claims individualized instruction with technology (Koedinger, et al) may not only deal with individual differences in background knowledge, but also challenge the notion there are meaningful differences in the rate of learning.

How variable is the impact of aptitude?

Koedinger and colleagues studied the work of thousands of students from all grade levels working on different types of content using the type of technology-enabled methods I described above. Their focus was different in being based on the mastery of very specific capabilities rather than courses or even weeks of work. The learning experiences consisted of initial exposure to information (video or written) followed by a sequence of worked activities. I suppose a worksheet would be an example of a worked activity, but the variety and type of activities included a many different activities. The goal was to reach 80% mastery on a worked activity. The authors found that in the first attempt following the acquisition phase, the top half of students scored 75% and the bottom half scored 50%. The top half then required 3.7 practice trials to reach mastery (80%) and the bottom half 13 trials. What startled the researchers was that the gain per practice trial was very similar leading the researchers to conclude learning rate was very similar once existing knowledge was addressed. Aptitude (if I can be allowed to switch terms here) accounts for little difference in speed.

I am not convinced I would interpret these results in the same way given the method, but I do like the demonstration that allowing additional learning trials allows students the same level of achievement. I encourage interested parties to review the study themselves and see if they agree with my assessment. The statistical method is quite sophisticated and I wonder what interpretations the method allows. I would be more convinced had the researchers carried their research over an extended period of time and actually determined what happens when individual differences in existing knowledge are eliminated. The difference in understanding after the individual phase of exposure to new content was substantial and while likely a partial product of existing differences in background it does not seem to me that the difference would not partially also be due to aptitude differences. Since learners with existing background knowledge are not involved, it seems to me there is no demonstration that aptitude does not play a role in determining the number of practice trials that are required.

I am pleased to see that this type of research continues and assume this study will generate replications and hopefully extensions.

Additional comments on mastery learning and learning speed

Arlin, M. (1984). Time variability in mastery learning. American Educational Research Journal, 21(1), 103–120.

Arlin, M. (1984b). Time, equality, and mastery learning. Review of Educational Research, 54(1), 65–86.

Bloom, B. S. (1974). Time and learning. American Psychologist, 29(9), 682–688.

Koedinger, K. R., Carvalho, P. F., Liu, R., & McLaughlin, E. A. (2023). An astonishing regularity in student learning rate. Proceedings of the National Academy of Sciences, 120(13), e2221311120.

Khan, S. (2012) The One World Schoolhouse?—?Education Reimagined. Hodder and Stoughton, London, 2012 and Twelwe, Boston & New York.

Loading

Mastery of what?

I want to make what follows as practical as I can. I understand that standards, curriculum, and instructional approaches (e,g., mastery) can drift into the realm of what some regard in a negative way as theoretical, but I think I can frame what I have to say here in terms of some questions teachers must answer all of the time. 

Let me begin with the question of what you regard as the essential knowledge and skills your students should acquire. I would suggest that you in one way or another address this question by answering a series of questions.

What content and experiences will I use with the students in my class?

What will I do to evaluate student mastery of these experiences and content?

What will I do with the students who perform poorly on the methods of evaluation I have applied? 

My proposal is that this sequence of questions provides a way to look at the label of essential? Some experiences and content were essential enough to provide. Some experiences and content were essential enough to evaluate. It is this third question I am most interested in because all educators face this challenge. It really gets at the core of the designation of “essential”? What happens when skills or knowledge are not developed? Is the answer “I move ahead to new content and new skills”? I refer some students for outside help? I take students or small groups aside and work with them in an effort at remediation.

Old school mastery proponents (e.g., Bloom, Keller) addressed what must be mastered in a fuzzy way. Rather than identify specific things that must be known, they hedged. Bloom proposed a group-based mastery system. Imagine a textbook chapter and related classroom contributions to be mastered over a two-week period of time. Bloom proposed that teachers first focus on essential skills and knowledge (not really clear to me how this material was identified). At the end of maybe a week, students completed an evaluation related to these materials that Bloom labeled a formative evaluation. Those learners who “passed” this evaluation went on to supplemental goals and those who failed to achieve mastery received further help with the essential goals. At the end of the time set aside for the unit, students completed a summative evaluation over the essential goals and everyone moved on essential goals met or not.

Keller’s PSI (personalized system of instruction) focused heavily on written content as a way to allow personalized progress – think textbook again. Reading is an individual way to confront new information. When students felt they were ready to be evaluated on their mastery of a unit, the asked a tutor to provide an assessment. Pass/not pass was based on an overall score so what was mastered was not really determined at the level of specific elements of understanding. Those who passed went on to the next unit and those who did not pass continued to study the chapter yet to be mastered with some assistance from a tutor.

Modern mastery (Kahn Academy, Modern Classroom Project) advocates confront the question of essential more directly. Before I try to address how, I will try to answer my original question – “Mastery of what?” I would suggest essential means a) knowledge or skill is necessary for learning some other essential knowledge or skill or b) knowledge or skill the system has a responsibility to develop and this development is expected of the course or grade level I teach. 

For example, double digit subtraction is essential to being able to master long division. The “North Dakota Studies” course is likely the one time you would learn why the Red River Valley has some of the richest farmland in the world. Okay, maybe this is not essential, but it matters to those who live in this area and depend on agriculture. Essential is a squishy thing and one could argue that a cellphone would allow anyone to perform long division and explain the soil quality of the Red River Valley without knowing how to subtract or basic geological facts. However, I assume there are essential things we teach that are a subset of all things we teach.

The Kahn Academy uses a complex model of the content with multiple strands identifying which skills/knowledge are prerequisites to what other skills. Students make progress across strands and must show mastery of prerequisites when identified within a given strand. Kahn complained about “Swiss cheese knowledge” that can be generated when students advance without prerequisite knowledge leaving gaps in skills and understanding that make future learning more difficult.

The Modern Classroom Project suggests educators identify differences in the importance of specific knowledge or skills using a triage of sorts – must do, should do, and aspire to do or need to know, good to know, and aim to know. This approach allows classroom educators to differentiate objectives in a way that allows more uniform progress within a group and still requires an extended focus on some prerequisites.

My long-time interest in mastery learning more recently combined with my interest in the classroom benefits of technology allow what I consider improvements in both the value and the practicality of mastery approaches. The value concerns a way to address the difficulty of new learning when past learning does not provide important existing knowledge. The efficiency associated with technology comes from the tracking of what has been learned and what should be learned next on a far more specific and individual student level. As I hope my analysis has made clear, the specificity of what should be learned next sometimes matters and sometimes does not. “Just in-time learning” is always possible, but this concept still requires a method of identification and application that group based approaches to teaching/learning do not make practical. Using teacher skills in a different way (tutor, coach) in combination with the value of technology in tracking individuals and delivering learning experiences seems a productive alternative to group-based approaches. 

As a final comment, I wonder if big data will provide a way to address the issue of necessary prerequisites in a more specific way. Would there be a computational way of creating the strands of knowledge/skill units Kahn has identified based on intuition?

References

Bloom, B. S. (1968). Learning for Mastery. Instruction and Curriculum. Regional Education Laboratory for the Carolinas and Virginia, Topical Papers and Reprints, Number 1. Evaluation comment1(2), n2.

Khan, S. (2012). The one world schoolhouse: Education reimagined. Twelve.

Keller, F. S. (1968). “Good-bye teacher”. Journal of Applied Behavior Analysis, 1, 79–89

Modern Classroom Project – https://intercom.help/modern-classrooms/en/articles/5261634-must-do-should-do-and-aspire-to-do

Loading

Are we ignoring differences in rate of learning?

I can identify a half dozen or fewer themes that have captivated my professional imagination over the 40+ years of my academic career. So many of these themes often were at the core of specific research interests and my applied work. Sometimes a theme was something I found interesting at the time it was first encountered, but I saw no practical way the idea could be implemened. Sometimes this situation has changed. The best example of this “opportunity discovered” comes from my original interest in individual differences in the rate of learning and my later interest in technology and how the affordances of technology could make responding to differences practical. 

The concept of aptitude is a topic educational psychologists teach. We may talk about issues associated with aptitude tests and perhaps biases in these tests as measures of aptitude or perhaps problems in the way test results were applied. Intelligence tests make perhaps the best example of an attempt to estimate general aptitude. Aptitude tests are about prediction and intelligence scores are predictive of achievement. Past achievement may be a better predictor of future achievement, but sometimes there is value in breaking down the components that contribute to achievement differences. Aptitude as an estimate of potential does not guarantee that potential will be realized and this difference, if real, is worth investigating.

As I said originally, I am interested in individual differences in the rate of learning and the practical consequences of these differences in rate under different classroom circumstances. I can trace my personal interest back to the theoretical work of Carrol (1963, 1989) which proposed what I interpreted as an optimistic model of learning. The model proposed that most individuals could learn most things if provided enough time. Carroll then differentiated the time required the time provided and then broke time required down according to variables that were influential. Aptitude proposed that aptitude was a way of understanding the time required under ideal conditions of optimal instruction and the presence of relevant existing knowledge.

I saw a connection to the notion of IQ which few seemed to make. The classic representation, IQ=MA/CA, is really about time and rate of learning. CA (chronological age) is the time available for learning and MA (mental age) is really how much has been learned estimated as the average knowledge of others of a given age. Hence MA/CA is rate of learning. The amount of general knowledge that has been acquired relative to what is typical is one way to estimate this rate. It is problematic in practice because it assumes equal opportunity which is of course idealistic.

A different way to estimate rate of learning might be to measure it directly and this is possible with various forms of individualized instruction. I remember the time when individualization was called programmed instruction and was accomplished using sequenced paper materials (see Yaeger). For example, I remember a reading comprehension implementation based on a box of cards with short reading passages and related questions that reflected different levels of text complexity. I remember this as an SRA reading product. The box of cards was based on a color scheme representing each level (e.g., brown cards, green cards, orange cards) and there were multiple cards at each level. Students would start at a common level, read a card, and attempt the related questions. If they obtained an established score, they were advanced to the next level. If not, they would take a different card of the same color and try again. Students would progress at different rates and the difference in time required to advance from level a to level could be used as one way to estimate reading aptitude.

There are now multiple technology-supported systems (e.g., Kahn Academy) based on a similar model (I refer to such approaches as mastery learning after the use of this term by Bloom, and Keller in the late 1960s). 

Rate of learning could also be impacted by the presence or absence of relevant background knowledge. More recently, Kahn (Kahn Academy) has described this as the problem of Swiss cheese knowledge. Do students have the relevant prerequisites for acquiring a given new skill or concept?

How little variability in the rate of learning would exist given ideal instruction and the mastery of prerequisites has become an interesting question. To me, this seems similar to asking the question if there are really differences in the theoretical notion of intelligence or are the individual differences we observed due to differences in motivation, background knowledge, and instructional quality. 

Why does it matter? I think it matters because educators and on a different level our models of education must deal with individual differences. However conceptualized, every teacher must make decisions about the rate of presentation that slows down the rate at which some students could learn and moves too fast for other students. The reality of aptitude as differences in rate of learning is there whether we choose to ignore it or not. Estimates of this variable range from 3:1 to 10:1 (Arlin). I liked to pick 5:1 and proposed to future teachers that some of their students would “get it” during their class on Monday and suggest they would have to work on the same concepts for the rest of the week to get most of the students to the same place. What should they do between Monday and Friday?

I would suggest that techniques have been available to provide a solution since the late 1960s. Mastery learning proposes to create settings that address differences in background knowledge by focusing on assuring students progress when ready and not so much the calendar says it is time to begin the next unit. My way of describing the goal would be to say the goal is to reduce the variability in time required to the bare minimum required by differences in aptitude by addressing differences in background knowledge and moving ahead at a rate individual students can handle reducing their frustration at not being able to succeed at meeting learning goals. 

I see two practical ways to accomplish an approach of this type – tutoring and technology. Tutoring is very effective in meeting individual student needs, but expensive. Technology provides a more cost effective approach and offers advantages in content presentation, evaluation of understanding, and record keeping over early implementations of mastery learning. Technology can free teachers from having to take total responsibility for these functions and to provide more time to function as an individual or small group tutor. More on some of these ideas in future posts. 

Related references:

Arlin, M. (1984). Time variability in mastery learning. American Educational Research Journal, 21(1), 103-120.

Arlin, M. (1984b). Time, equality, and mastery learning. Review of Educational Research, 54(1), 65-86.

Bloom, B. S. (1968). Learning for Mastery. Instruction and Curriculum. Regional Education Laboratory for the Carolinas and Virginia, Topical Papers and Reprints, Number 1. Evaluation comment1(2), n2.

Bloom, B. S. (1974). Time and learning. American psychologist29(9), 682-688.

Carroll, J. B. (1963). A model of school learning. Teachers college record64(8), 1-9.

Carroll, J. B. (1989). The Carroll model: A 25-year retrospective and prospective view. Educational researcher18(1), 26-31.

Keller, F. S. (1968). Goodbye teacher… Journal of Applied Behavior Analysis Vol. 1, pg. 79-89.

Khan, S. (2012). The one world schoolhouse: Education reimagined. Twelve.

Yeager, J. L., & Lindvall, C. M. (1967). An exploratory investigation of selected measures of rate of learning. The Journal of Experimental Education36(2), 78-81.

Loading