Scholastic Alchemy
Posts
I'm Skeptical: Tutoring

I'm Skeptical: Tutoring

Whatever the flavor, I'm not sure tutoring is a silver bullet

James Shanahan
July 09, 2025

Welcome to Scholastic Alchemy! I’m James and I write mostly about education. I find it fascinating and at the same time maddening. Scholastic Alchemy is my attempt to make sense of and explain the perpetual oddities around education, as well as to share my thoughts on related topics. On Wednesdays I post a long-ish dive into a topic of my choosing. On Fridays I post some links I’ve encountered that week and some commentary about what I’m sharing. Scholastic Alchemy will remain free for the foreseeable future but if you like my work and want to support me, please consider a paid subscription. If you have objections to Substack as a platform, I maintain a parallel version using BeeHiiv and you can subscribe there.

I’m back from a restorative and fun vacation/staycation combo with the family and ready to dive back into the I’m Skeptical series. As a reminder, when I say that I’m skeptical of some big trend or idea in education, I don’t always mean the idea is flat-out wrong. I may be skeptical because there just isn’t enough information available to justify the claims being made. Maybe I’m skeptical because the idea itself seems fine but implementation is going wrong in a predicable way. Or, perhaps my skepticism is rooted in confusion around the theory of learning or theory of action behind the idea. Whatever the source of my skepticism, my overall goal is to stick close to the titular challenge of this blog/newsletter/thing. Scholastic Alchemy is my attempt to understand and communicate about things in the world of education that don’t turn out as intended. If you’re convinced that one neat trick is all that’s needed to dramatically improve student achievement, transform schools, or otherwise change the education landscape, I see it as my job to create some doubt. Education history is littered with failed initiatives that perfectly smart and reasonable people believed would help.

Tutoring is history

I was watching Meet the Press on Sunday and saw Sal Khan of Khan Academy come on to talk about AI revolutionizing education.

Today’s post is related to AI but it’s not about AI. In fact, after watching Sal Khan, I think he’s not really talking about AI either. He’s talking about tutoring. I recommend watching the whole interview but here’s the bit I want to home in on around the 28-minute mark:

KW: Why is it so important that it’s personalized and, you say, really a one-on-one experience in many ways?

SK: You go back about 2,300 years, you would see Aristotle tutoring Alexander the Great, or who would be later called Alexander the Great. And for most of human history, that was the gold standard of education. You had a personal tutor or sometimes a team of tutors. But, most people didn’t get that. You had to be a prince. You had to be a member of nobility. And about 230 years ago we had this very utopian idea, mass public education, but we had to compromise. We couldn’t afford to give everyone a personal tutor, so we batch students together in groups of 25, 30, 35. We start moving them together and that’s the system we have today.

It’s done hugely positive things! Literacy rates have gone through the roof. You know, like, algebra used to be considered esoteric. Now we expect everyone to learn it. But, we also know that a lot of people fall behind.

A few minutes later, he says that AI tools, like LLMs, will get us back to that classical tutoring relationship.

SK: [I] started to say, ‘hey, why wouldn’t we use this [ChatGPT] not by itself but in conjunction with teacher tools, in conjunction with videos, in conjunction with the software exercises to get that much closer to what Alexander the Great had with Aristotle?

I have already touched on my problems with personalized learning in a previous I’m Skeptical post. The quick version is that the term isn’t helpful because almost anything can be considered personalized learning. Even here, Sal Khan is saying that one-on-one tutoring with Aristotle and AI-driven software tutoring are both personalized learning. We have different modes of delivery, different curricula, different social contexts, different millennia, yet both are the same if you just think about it? I’m thinking about it and I think what’s happening is that anything Sal Khan likes is called personalized learning. Which, fine, but it’s not a well-studied category of educational intervention because people like Sal Khan call everything they’re promoting personalized learning. The evidence is lacking, is all I’m saying.

But there is a larger problem here and that’s the problem of history. Khan’s story is that historically the gold standard of education is one-on-one tutoring. This simply is not true! School has been the gold standard. For more than a century before Aristotle was born, well-to-do Athenians attended elementary school. No, really! It was something that pretty closely resembles modern elementary school. It’s the reason we call gymnasiums gymnasiums. These were not today’s gyms where adults go and work out. These were places for children to learn via physical activity as 1/3 of their elementary education curriculum; the other two being grammar and music. Gymnastikē, mousikē, and grammata. The male children of wealthy Athenians attended these schools until early adolescence. They were not free nor compulsory, of course, but the elite education of 500 BCE Greece was not tutoring, it was school.

When, more than a century later, Phillip of Macedon hired Aristotle to tutor his young son Alexander, it was not because King Phillip knew that tutoring was the best form of education. No, Phillip was a king. Kings do not share their children’s time and efforts with the merely wealthy or even other nobility. Schools are not places for royalty because there is a hierarchical separation of royal lineages from everyone else. Beyond that, Macedonia was, at that point, still an up-and-coming power whereas classical Athens, Thebes, and Corinth had been established city-states for some time. These Greeks (even the Spartans had school-like martial training for the boys) had schools for the elites, the Macedonians did not. So, King Phillip poached the Athenians’ most successful teacher, Aristotle. And, by the way, when Aristotle had completed his tutelage of young Alexander, he returned to Athens and founded a school, the Lyceum, with money from Alexander. If tutoring was the best, you’d think Aristotle would have done more tutoring instead. Surely, he could have landed another gig with a king somewhere else?

School, not tutoring, remained the gold standard for elites throughout the ancient world with only monarchs and emperors hiring tutors, in part because they were simply following the pattern established by the greatest conqueror in western history, and in part because emperors and kings don’t mix their children with others, even other elites. My sourcing here comes from two books that I highly recommend to anyone looking to learn more about ancient education. The first is Beck’s 1964 classic Greek Education, 450-350 B.C. The other is the more recent Greek and Roman Education: A Sourcebook from Joyal, Yardley, & McDougall. The source book is especially awesome because they collect several hundred ancient texts and provide primary source documentation of how education worked in ancient Greece and Rome.

I also have to question what Khan calls the “very utopian idea” mass education. It turns out, the utopian ideals came later. What the parliaments, legislators, and monarchs were thinking about when they established mass education was control over the population. Khan is simply wrong about the history here and it makes me wonder what else he may be wrong about.

Is Tutoring Better?

Historical or not, there is a broad underlying assumption in society that tutoring is simply superior to classroom or other kinds of learning. The reality is, as always, much more complicated. Some versions of tutoring do appear to work and show consistent, if small, gains. For example, SAT and ACT tutoring work (See here, here, and here). Students who receive tutoring see a score increase of about 20 points on the SAT and .4 on the ACT. This is often lower than what tutors and test prep companies will claim, but the companies are often using proprietary data and will exclude kids who did the test prep but did not report a score that year or otherwise massaged to make the product look better. The NELS data from the studies I’ve cited is more thorough and captures all kinds of tutoring and test preparation for these two college entrance exams.

Beyond the SAT and ACT, there is decent positive evidence for the efficacy of tutoring on academic outcomes. In one meta-analysis of tutoring studies researchers found that the overall effect size of tutoring was an impressive 0.37 standard deviations. Seems strong! Another meta-analysis found that intensive tutoring worked well when compared with many other interventions for low SES youth.

They also report a similar effect size around 0.35-0.4 SD. I do want to point out that this is far lower than the (in)famous estimate by Bloom that tutoring yields a 2 standard deviation improvement. Notably, there were some flaws with Bloom’s initial work, so these later meta-analyses are probably closer to accurate. However, we have some reason to believe that these large effect sizes are not easily scalable to real-world situations. As tutoring sessions move beyond the 1:1 ratio, effect sizes decline. Perhaps, though, using LLMs, software, and video delivered through technology is a way to achieve scale?

I think it’s also important that studies of coaching/mentoring students performed worse, as did those of computer-assisted instruction. The difference between coaching/mentoring and tutoring is that tutoring is about the content of a course or topic whereas coaching and mentoring are more about study skills or mindset or building better academic habits. CAI, which is perennially in vogue, also underperformed 6 other interventions being studied. In fact, the three best intervention categories all included coverage of the content. I want to suggest that this is, actually, quite crucial to understanding when tutoring is successful and when it is not. If all of your tutoring is on test taking strategies or academic skill building, you are probably not going to see the gains that you would from covering the academic content of the course. Likewise, just because you use technology it doesn’t mean you’re going to get better results. Both of these are important given the accelerating combination of tutoring and AI tech.

One way to think about this is to say that adding the scale of AI tutoring is also adding the diminished effect sizes of computer assisted instruction. Yes, you’ll scale but you’ll also have less impact. You’re combining an intervention with an effect just under 0.4 SD with an intervention with an effect of about 0.05 SD. They aren’t additive. Now, I don’t have all the studies from these analyses on hand to try a proper pooled effect size calculation to estimate the effect size of CAI + Tutoring but my best off-the-cuff calculation is about 0.25 SD. It’s not nothing but it’s also not going to revolutionize schooling.

One neat trick: actually coming to school and using the tutoring

What’s more, once we leave the realm of studies specifically designed to test tutoring (and other interventions) and move into broad implementations in actual schools, tutoring doesn’t perform as well. One reason we may be seeing lackluster academic results from post-pandemic intensive tutoring initiatives is that the students most in need of tutoring support are also most likely to be chronically absent. So, even though one intensive tutoring program in Washington DC showed that kids were 7% more likely to attend school on days they had a tutoring session, you have to weigh that against two things. First, there were minimal academic gains.

While tutored students still scored below their nontutored classmates, the gap between the two groups diminished over the course of the year. Students who started our further behind their peers but had over 20 sessions of tutoring, were able to shrink their gap with nontutored peers from 0.23 standard deviations to 0.14 standard deviations.

So, while less far behind, these students are still behind after a full year of tutoring interventions. Moreover, they are behind peers who are themselves already behind. Perhaps this is because humans learn at a stable and constant rate, so kids who start out ahead will remain ahead. Maybe the very idea that kids can catch up is wrong? Either way, there is a second more pressing problem here. In an article about the attendance study, we learn something important about the attendance data.

“That feels minimal, just a day or so,” [study co-author] Lee admitted. But she said it was “encouraging to move the needle at all,” with this group of economically disadvantaged students. More than 80 percent of the tutored students were Black. The remainder were largely Hispanic.

What struck me was the high average absenteeism rate among the thousands of students selected for tutoring: 17 percent. In other words, these students had missed more than 30 days, not including weekends. A large subset of them – one out of six – were considered to be “extremely absent,” missing more than 30 percent of the school year. That’s about 60 school days. “They’re missing school at an alarming rate,” said Lee.

I don’t know how much success you can expect out of a tutoring program for kids who are absent for 30+ days of school and it’s an important reminder that attendance is a necessary precondition for any intervention. The overall portion of students who attended the full number of tutoring sessions was just 27%. Indeed, attendance has been noted as a major problem for intensive tutoring interventions in other studies, too. An intensive virtual tutoring program in Nashville failed to show much in the way of academic gains and the researchers identified attendance as a major challenge. On average only ~20% of the students identified were actually attending their tutoring sessions!

Another attempt in California elementary schools showed that their reading tutoring software, BookNook, ran into similar problems.

Of the 959 students in treatment cohorts, 196 (20.4%) met the lower-bound threshold of recommended BookNook engagement, calculated as two completed sessions per week, for a total of 20 sessions.

But, for the 20% who actually used BookNook enough for them to be considered properly engaged in the learning software, there was a huge improvement, right? Not really. More along the lines of an average improvement.

those who completed 20 or more sessions—the recommended dosage—experienced a 0.26 SD developmental advantage

Hey, my off-the-cuff pooled effect size estimate is looking pretty good! When you combine CAI with tutoring, you don’t get a better outcome than tutoring, you get a worse one. But, again, the challenge is actually getting the kids to use the software and follow through with the tutoring. I think it’s telling that in all three studies (the DC one, the Tennessee one, and the California one, the percentage of students who attended the recommended number of tutoring sessions was roughly similar, falling between 17% and 27%. Or, to smooth out the decimals and put it into ratios a bit, only around 1 in 4 or 1 in 5 students who needed tutoring actually got the recommended amount of tutoring. This is all very reminiscent of EdTech’s 5% problem.

Given that these are real-world implementations of both in-person and online tutoring, I think it’s really important to understand just how many kids are actually being served by these programs. More kids needed tutoring, were given opportunities to receive tutoring, and just didn’t show up enough for the tutoring to matter. In fact, most kids didn’t. The vast majority of kids didn’t. Regardless of whether and how much tutoring “works”, if kids aren’t doing it, the intervention doesn’t matter. If only 20% of your neediest kids receive a 0.26 SD boost, what does that mean for the overall student body? How much of an improvement is that for the whole school or district or state? Probably not much. The worry, though, is that schools will continue to shell out big bucks for and expend precious time and effort on newfangled AI tutoring products only to find there’s not much improvement.

Wrapping up

Out of all the things I’m skeptical of, tutoring probably has the strongest body of evidence going for it. It really does seem like tutoring can deliver modest learning gains. It’s technically effective for everything from elementary literacy and math to college entrance exams like the SAT and ACT. But we also have to be cautious in assuming studies designed to evaluate the efficacy of tutoring translate well into real-world environments. Not only might we see reduced effects as tutor/student ratios grow, but we encounter a key challenge schools are already facing, attendance. The kids who need tutoring the most are least likely to get it. As school funds are cut and state budgets become strained, expensive and staff-intensive tutoring programs may be on the chopping block anyway. As they fail to post the promised gains, schools may feel their money and efforts are wasted.

At the same time, though, we have a consistent drum beat about AI tutors, like those promoted by Mr. Khan. We’re told that tutoring is the gold standard and that everyone should learn from tutors just like Alexander the Great did. A review of the actual history shows us that tutoring is a second-best alternative to what the classical Greeks actually preferred, schools. Interestingly, reviewing the modern evidence about tutoring tells us something similar. On its own, well implemented tutoring programs under controlled conditions with students well-suited to tutoring really do seem to produce some of the best outcomes of any intervention. When you scale the program, you lose some efficacy. When you take the program outside the conditions of an academic study and into real schools, you lose some efficacy. When you put the tutoring online instead of in-person, you lose some efficacy. All of a sudden, your 0.4 SD becomes 0.25 SD and your best option becomes the second or third best. The Greeks figured this out 2,500 years ago and built schools to educate the children of elites who ran their society. They could have tutored their kids. Clearly some did somewhere! But the best option was the school where elite kids would learn to read and write, to sing and recite poetry, and would compete physically. Schools today have their problems and tutoring may be part of some improvements, but I am skeptical that tutoring, whether human or AI, is going to dramatically improve our schools or students’ outcomes.

Thanks for reading!