Free audio version available here.
In the last few weeks two articles appeared that garnered an unusual amount of attention. In both cases the AI chatter on the internet seemed to blow up, and I got flooded with texts and emails. The first was “The Illusion of Thinking,” which many people cited to claim that there’s some inherent limitation in the reasoning abilities of modern LLMs. The second was the article “Your Brain on ChatGPT,” which was taken as evidence that LLM use leads to cognitive decline. In both cases AI critics cherry-picked quotes from the article to promote an anti-AI bias, without any evidence of understanding the paper’s results. AI raises serious issues that are worth debating: existential risks, energy use, job displacement, challenges to education, the potential to spread misinformation, and so on. In the face of these serious concerns, promoting inaccurate interpretations of research with statements like “ChatGPT is rotting your brain” isn’t just irresponsible, it’s extremely dangerous.
I’ll get in to the details of both papers mentioned above momentarily, but first I’d like to address the general phenomenon of anti-AI bias, and the damage it can do. I’ve only been writing this substack for about six weeks now, and even in this short time I’ve been struck by some of the push-back I’ve received. For example, my college is having a lot of discussions about the way in which AI is having an effect on both education and society. As the only faculty member at my college who teaches about the inner-workings of AI, part of my goal with this substack is to share my own insights with them on the subject, to help move those conversations along. And yet, I know some of my most anti-AI colleagues have simply refused to read anything I write despite personal invitations, because they perceive me as an “AI evangelist.” Similarly, when I’ve told some personal friends I’m writing a substack about both the pros and cons of AI, I’ve gotten the response, “Why do you feel the need to promote AI?”
How can you talk about balancing pros and cons without considering both? How can you take steps to mitigate AI risks without a basic understanding of the technology? How can you even begin to talk about how we should be preparing students for an AI powered world without engaging with that world?
I know I’m far from alone. In his substack, Lance Eaton recently related an incident where he was accused of being a “shill for AI” after a public talk on navigating opportunities and challenges of incorporating AI in education. These sorts of stories are, unfortunately, more and more ubiquitous.
Let’s get in to the papers that I mentioned in the opening paragraph. In the Illusion of Thinking paper, researchers related experiments with LLM “reasoning models” in which several tasks were given to them of increasing complexity, and in each case performance cratered after small increases in difficulty. For example, one task was the “Towers of Hanoi”, in which the goal is to move a stack of disks, of increasing size, from one of three pegs to another. Disks can only be moved one at a time, and no disk can be placed on a smaller disk.
Some of the issue with the reported results was just about what a “solution” to this problem actually looks like. Solving the Towers of Hanoi task is a common assignment in an undergraduate “Introduction to Proof” class. An answer to this problem is an algorithm: a procedure that one could follow which would transfer any number of disks from one peg to another according to the rules of the game. However, that’s not the kind of solution that the authors of Illusion of Thinking were looking for. They didn’t want a procedure, they wanted an explicit sequence of moves for a given number of starting disks. Something of the form “Move disk 1 to peg 2, then disk 2 to peg 3, then disk 1 to peg 3, …”. The problem is that the minimal number of such moves grows very quickly with the number of starting disks: 3 disks takes 7 moves, 4 disks takes 31 moves, … 13 disks takes 8191 moves! The data required to represent that many moves (~10K tokens), together with the reasoning required to come up with such a sequence, is beyond the information limit of most LLMs, so of course they won’t be able to produce a correct sequence! Nobody even needed to run the experiment to see that it wouldn’t work. What was more interesting (at least to me), is that when the researchers set the number of starting disks high enough, the length of the LLM response went significantly down. There is evidence now that this was because the LLM recognized in advance that it wouldn’t have the memory to complete the task, and was smart enough to give up early and just state the general solution algorithm. The researchers presented this as a fail … I would call that evidence of very sophisticated reasoning!
So why is this dangerous? Some mistakenly took Illusion of Thinking as evidence that our current LLMs are very far from being sophisticated reasoners. Some who believe that might be convinced that they don’t have to worry about LLMs taking their job, when in fact they do. Policy makers may not be as concerned as they should be about existential risks posed by AI, and not do all they need to about AI safety regulation. Teachers may assume they don’t have to worry as much about student cheating on assignments involving sophisticated reasoning. The list goes on.
The Your Brain on ChatGPT paper is more insidious. This paper is unrefereed and unpublished, but the authors felt that their results were so important that they should release them. The sheer length of it (206 pages!) makes it daunting to even skim. Given its length and complexity, I expect very few people actually looked at it in any detail, if at all. In it, the authors describe an MIT study in which participants were tasked with writing essays with and without LLM assistance, while their brain activity was being monitored.
Everyone from the most casual blogger to reputable news outlets jumped on the story. Time.com ran the headline, “ChatGPT May Be Eroding Critical Thinking Skills”. For a brief period of time YouTube videos like “ChatGPT is DESTROYING our brains” and blog posts such as “ChatGPT might be draining your brain” were popping up like wildflowers.
Let’s look at the details of the study. The researchers recruited 54 participants, and divided them into three groups of 18. One group was allowed an LLM, and one a search engine. The remaining group was not allowed any technological assistance. Each participant was given 20 minutes to write an essay, on three separate occasions, spaced months apart. The research showed the least brain activity occurred during essay writing in the LLM-assisted group.
This was not a study on the long term (or even short term) effects of LLM use on the brain. 20 minutes is simply not enough time to effect brain chemistry. The study only showed that if you off-load some of the cognitive task of writing an essay to an LLM then you use less of your brain, and the essay you produce will be more generic. The only mystery to me is: why was that a surprise to anyone??!!
Again, these kinds of misrepresentations are dangerous. Despite the obvious abuse of LLMs by students to cheat on assignments, they do have clear positive use cases in educational settings. If we’re worrying about what LLMs do to students’ brains based on an inappropriate reading (or no reading at all) of some questionable research, then we won’t be asking the important questions about how to take advantage of possible benefits and mitigate true risks.
I’ll conclude by acknowledging that both Illusion of Thinking and Your Brain on ChatGPT contain insights that genuinely merit discussion. For instance, the latter paper intriguingly found that brain activity was highest among participants who first wrote essays without assistance, then refined them using an LLM. Such subtle findings can advance our understanding of how to effectively integrate AI, but are completely overshadowed by sensationalist, anti-AI chatter. It’s a powerful reminder that alarmist headlines carry real costs when they crowd out thoughtful engagement and meaningful debate.
I recently listened to this episode: https://teachinginhighered.com/podcast/the-ai-con/ of Teaching in higher Ed on the AI Con text by Emily Bender and Alex Hanna and I really appreciated the unpacking of the concept of Stochastic Parroting. I participated in the norming process for FYS writing assignments last semester. One of our colleagues uploaded the rubric parameters in AI and then uploaded one of the essays they thought was exceptional. The essay performed poorly according to AI, then they uploaded one of the essays that was less innovative and it performed really well. This concept of Stochastic parroting explains why. The most elusive aspect of the rubric was in reading for intellectual inquiry. I guess what I'm saying is that I think this is helpful to me in explaining to students why generative AI
or LLM are not good at creative writing.
When I was a CS student they gave us Towers of Hanoi to solve as an introduction to recursion. After a week of cogitating, the algorithm came to me in a dream - literally. It was one of the great learning experiences of my life. I liked your explanation of what the Apple paper actually said about the AI's ability to solve it. Clearest explanation I've read yet. I do wonder: could an AI have *my* experience - of coming to a solution as a kind of revelation??