I for one would love LLMs to solve the physics of the singularity prior to the First Three Minutes of the universe, create a molecule that effectively treats schizophrenia, or specifically diagram how its own deep neural net outputs a correct answer.
Creativity in literature or art is a horse of a different color/magisterial domain. Was Moby-Dick regarded as a work of genius in 1851? Nope ... and its sales were dismal until the 1920s. William Faulkner was a fairly obscure regional author until Malcom Cowley resurrected him in 1946.
How can we expect LLMs to find "the correct answer" for the moving target of culture as "the best that has been thought and said in the world"(Matthew Arnold 1869)
In my opinion, the fundamental premise of this article, namely that RL fostered creativity in LLMs, is false. By any non-goal-post-moving definition of creativity, neural networks are inherently creative, in that they can interpolate and extrapolate in novel ways. Take a look at the GPT2 announcement from OpenAI in 2019 and read the text about the discovery of Unicorns (https://openai.com/index/better-language-models/). This is an entire fantastical creation that created new and unique details for a scenario that did not exist in the training data. When GPT3 came out (before ChatGPT had even been thought of), I played with what it could do and saw plenty of creative text generation. Similarly, when image-generation models came out, people delighted in creating hybrid animals, mixing together different and purportedly incompatible art styles and so forth.
RL, if anything, adds _constraints_ to the creativity of LLMs. It says “you must be logical” or “always check your sources” or “don't claim to have any kind of sense of self, that upsets the humans”.
We can, of course, adopt some kind of definition of creativity that presupposes some kind of ex nihilo belief about human creativity, that we're not merely remixing and reinterpreting what has gone before, but magically pulling new ideas out of the ether, unmoved by all that has gone before, but if that's the position, well, there's little point in having any kind of conversation as we've decided the conclusion at the outset, only humans need apply for the creativity merit badge.
That said, I'm still delighted to read your post and see you thinking about these issues, and even if we disagree about what makes AI potentially creative, we both agree that it sometimes is, by our own definitions.
And with creativity in mind, inspired by your post, I asked Claude to come up with a creative parody of your post, where we ask whether planes can really fly, and make the claim that only jets come close to the flight freedom of birds. It's just a bit of fun, but I hope you enjoy it (https://claude.ai/public/artifacts/725072aa-b13e-41b9-a942-23322987bf55).
Thanks, Melissa! I always appreciate your comments. Perhaps this time, though, you missed the fact that the post is specifically focused on creative problem solving? The "constraints" you mention as a critique of RL are absolutely necessary. A creative solution to a math problem can't involve creative rules of logic. Creative code must still compile, etc. GPT 3 was terrible at problem solving. It took RL to really crack that, and of course there's still plenty of room for improvement.
Perhaps you might be able to see how I misunderstood your key point. The title of your piece is “Can LLMs be creative?” with the subtitle “How far out of distribution can modern AI systems go?”, perhaps it's not a huge surprise that that's what I took the thrust of the article to be.
Arguably, solving problems in restricted domains is something classical AI and other search techniques have a long history with. We don't think of most chess programs as creative, even as they thrash humans. AlphaGo has at least as much in common with chess programs as it does with LLMs (in fact, its successor, AlphaZero can be either). AlphaGo's main mode of “creative exploration” is Monte Carlo Tree Search. In the game with Lee Sedol (who, incidentally, was not the best in the world, at that time, being ranked below Lee Chang-ho), the 37th move did indeed show that AlphaGo’s board evaluation function (via its policy network and value network) had captured a nuance previously undiscovered by humans, but today's chess programs also make “inspired” moves that grandmasters can learn from, but again, people don't see chess programs as the epitome of creativity.
Likewise, SMT systems like Z3, and symbolic math systems like Mathematica, and interactive theorem provers like Rocq follow the rules of math and can churn out solutions to challenging problems and prove proofs, yet few would call them creative. Google Deep Mind's first foray into the Math Olympiad used LLMs heavily augmented by these kinds of systems to achieve its silver medal status. So again, in my eyes, this prowess in solving math problems doesn't, of itself, show evidence of the kind of creativity that matters to people.
With GPT-3, one of the first things that helped with problem solving wasn't RL per se, it was providing instructions to be systematic, to “think step by step”. In part, this was compensating for a problem with the training data. We tend to publish final answers, not the scaffolding that got us there, and so in their writing, LLMs attempted to recreate what they'd seen and skip all the careful working. Whether by RL or by just changing the training data, a necessary step was to say “no, don't just leap to an answer, think it through, and pay attention to catch your own mistakes”.
For agentic work, RL does help massively. If you want LLMs to navigate web pages to book a vacation for you, or be sure to run the test suite before committing changes to the codebase, it's absolutely going to help. But again, coloring inside the lines is rarely seen as a sign of creativity. Although, sure, in some contexts, rules do help channel creative forces productively.
In any case, I wasn't critiquing RL any more than the gentle parody I commissioned that pondered whether planes could “really fly” was critiquing jets—much as you don't need a jet to perform aerobatics and you don't need a Harrier to land in a field, we can witness plenty of creativity from LLMs with no RL in sight. Of course, no matter what, we can always say it isn't “the right kind” of creativity much as we can say that our airplanes don't do “the right kind” of flying. RL won't stop people saying that “true creativity” belongs only to humans, as your final paragraph made clear.
As I sad in the original post, I was really focusing more on the idea of what it means to be "out of distribution". It does NOT mean that an LLM produced something new and novel. It means the LLM produced something not in the statistical distribution defined by the training data. In your initial response to my post you said GPT2 produced an "entire fantastical creation that created new and unique details for a scenario that did not exist in the training data." That does not mean what it produced was out-of-distribution. Almost by definition you need some other technique besides supervised learning to do that, and RL is pretty much the only thing that anyone developing LLMs has come up with. You can argue that ChatGPT wasn't just supervised learning, even from its beginning, because of the RLHF phase of training, but that's also RL.
With all that said, I basically agree with you .... supervised learning can certainly produce new things that APPEAR creative. When a human combines existing ideas in a new way they are certainly labelled creative, and that's what supervised learning can do.
Nicely put. LLMs by these lights are assuredly creative. As stated before, I hope we get an LLM- devised molecule that cures glioblastomas or solves Three Body problems in a twinkling, but in the cultural creativity arena we already have an overabundance of middlebrow (or worse) books/TV/movies/podcasts(hmm...that may be a tautology). Do we really deserve the LLM equivalent of the Victorian novelist, William Ainsworth, who wrote 45 books - mostly novels. Name one! (Interest piqued? Read Zadie Smith's The Fraud). Can LLMs generate sublime literature? It may be too soon to tell.
For what its worth human creativity IS different than LLM creativity in which syntax excludes semantics. If I can persuade you that structure has a great deal to do with function, the human CNS is a horse of a different color compared to LLMs' ersatz neural architecture which lacks >1000 synapses per neuron, astrocytes, microglia, axoplasm, glymphatics, neurotransmitters, intraneuronal hormones, and all the other neural wetware that evolved over the course of 4.5 billion years ... and which (unless you're a Dualist) appear to underlie human cognition ...and creativity. Human creativity is structurally and functionally sui generis and in the haute cultural arena will perforce be different than the LLM cultural creativity. Is it better? Will ChatGPT #8 be the next Tolstoy/Joyce/Proust? The audience awaits.
From what I can tell, William Ainsworth was popular in his day. Likely no one will care about Dan Brown a few decades from now either.
Is it your position that only our best art deserves the label of “creative”? Most people on the planet have no published work, no exhibitions. Is everyone outside the top 0.1% a worthless dullard?
A few years ago, we would have been amazed at an AI system that could write a coherent story, make a drawing that matched a prompt, or composed a coherent tune. Now of course, that's a given, and the question is whether the quality of the output is truly outstanding, outshining the best humans can do. Watching the goal posts move is my new spectator sport.
I'll leave the essentialist stuff you closed with alone, except to say that 4.5 billion years of evolution was optimizing for gene transmission, not creation of great art or literature.
As a undergraduate English major, I cordially invite you to explore the Victorian literary purgatory of WH Ainsworth. His Delphi digital oeuvre can be readily had for $2.99
Of course middlebrow or worse art is creative but the Techbros' aspirations appear to be aimed a little higher. When AI defeats Kasparov at chess and Lee Sedol at Go I'm pretty sure they would like to take on Tolstoy in a best of 5 series
If I'm guilty of promoting greatness in culture I fear AI is liable to the same accusation. I am concerned that the AI true believers may be punching above their weight in the realm of belle lettres. I don't believe stochastic parrots can write the next Swann's Way (imitations not accepted).
If you insist on invoking the Selfish Gene meme, help me grasp the selective/survival advantage for genes that produced our frontal lobes which created calculus and String Theory.
I assume that "essentialist stuff" is the AI way of saying, "I don't do neurons." More's the pity. If we discard lessons from the 4.5 billion yr evolution of the ur biological thinking machine for those of Babbage & Turing with ~200 yr evolution, we may be missing something
I for one would love LLMs to solve the physics of the singularity prior to the First Three Minutes of the universe, create a molecule that effectively treats schizophrenia, or specifically diagram how its own deep neural net outputs a correct answer.
Creativity in literature or art is a horse of a different color/magisterial domain. Was Moby-Dick regarded as a work of genius in 1851? Nope ... and its sales were dismal until the 1920s. William Faulkner was a fairly obscure regional author until Malcom Cowley resurrected him in 1946.
How can we expect LLMs to find "the correct answer" for the moving target of culture as "the best that has been thought and said in the world"(Matthew Arnold 1869)
In my opinion, the fundamental premise of this article, namely that RL fostered creativity in LLMs, is false. By any non-goal-post-moving definition of creativity, neural networks are inherently creative, in that they can interpolate and extrapolate in novel ways. Take a look at the GPT2 announcement from OpenAI in 2019 and read the text about the discovery of Unicorns (https://openai.com/index/better-language-models/). This is an entire fantastical creation that created new and unique details for a scenario that did not exist in the training data. When GPT3 came out (before ChatGPT had even been thought of), I played with what it could do and saw plenty of creative text generation. Similarly, when image-generation models came out, people delighted in creating hybrid animals, mixing together different and purportedly incompatible art styles and so forth.
RL, if anything, adds _constraints_ to the creativity of LLMs. It says “you must be logical” or “always check your sources” or “don't claim to have any kind of sense of self, that upsets the humans”.
We can, of course, adopt some kind of definition of creativity that presupposes some kind of ex nihilo belief about human creativity, that we're not merely remixing and reinterpreting what has gone before, but magically pulling new ideas out of the ether, unmoved by all that has gone before, but if that's the position, well, there's little point in having any kind of conversation as we've decided the conclusion at the outset, only humans need apply for the creativity merit badge.
That said, I'm still delighted to read your post and see you thinking about these issues, and even if we disagree about what makes AI potentially creative, we both agree that it sometimes is, by our own definitions.
And with creativity in mind, inspired by your post, I asked Claude to come up with a creative parody of your post, where we ask whether planes can really fly, and make the claim that only jets come close to the flight freedom of birds. It's just a bit of fun, but I hope you enjoy it (https://claude.ai/public/artifacts/725072aa-b13e-41b9-a942-23322987bf55).
Thanks, Melissa! I always appreciate your comments. Perhaps this time, though, you missed the fact that the post is specifically focused on creative problem solving? The "constraints" you mention as a critique of RL are absolutely necessary. A creative solution to a math problem can't involve creative rules of logic. Creative code must still compile, etc. GPT 3 was terrible at problem solving. It took RL to really crack that, and of course there's still plenty of room for improvement.
Thanks for the reply. Much appreciated!
Perhaps you might be able to see how I misunderstood your key point. The title of your piece is “Can LLMs be creative?” with the subtitle “How far out of distribution can modern AI systems go?”, perhaps it's not a huge surprise that that's what I took the thrust of the article to be.
Arguably, solving problems in restricted domains is something classical AI and other search techniques have a long history with. We don't think of most chess programs as creative, even as they thrash humans. AlphaGo has at least as much in common with chess programs as it does with LLMs (in fact, its successor, AlphaZero can be either). AlphaGo's main mode of “creative exploration” is Monte Carlo Tree Search. In the game with Lee Sedol (who, incidentally, was not the best in the world, at that time, being ranked below Lee Chang-ho), the 37th move did indeed show that AlphaGo’s board evaluation function (via its policy network and value network) had captured a nuance previously undiscovered by humans, but today's chess programs also make “inspired” moves that grandmasters can learn from, but again, people don't see chess programs as the epitome of creativity.
Likewise, SMT systems like Z3, and symbolic math systems like Mathematica, and interactive theorem provers like Rocq follow the rules of math and can churn out solutions to challenging problems and prove proofs, yet few would call them creative. Google Deep Mind's first foray into the Math Olympiad used LLMs heavily augmented by these kinds of systems to achieve its silver medal status. So again, in my eyes, this prowess in solving math problems doesn't, of itself, show evidence of the kind of creativity that matters to people.
With GPT-3, one of the first things that helped with problem solving wasn't RL per se, it was providing instructions to be systematic, to “think step by step”. In part, this was compensating for a problem with the training data. We tend to publish final answers, not the scaffolding that got us there, and so in their writing, LLMs attempted to recreate what they'd seen and skip all the careful working. Whether by RL or by just changing the training data, a necessary step was to say “no, don't just leap to an answer, think it through, and pay attention to catch your own mistakes”.
For agentic work, RL does help massively. If you want LLMs to navigate web pages to book a vacation for you, or be sure to run the test suite before committing changes to the codebase, it's absolutely going to help. But again, coloring inside the lines is rarely seen as a sign of creativity. Although, sure, in some contexts, rules do help channel creative forces productively.
In any case, I wasn't critiquing RL any more than the gentle parody I commissioned that pondered whether planes could “really fly” was critiquing jets—much as you don't need a jet to perform aerobatics and you don't need a Harrier to land in a field, we can witness plenty of creativity from LLMs with no RL in sight. Of course, no matter what, we can always say it isn't “the right kind” of creativity much as we can say that our airplanes don't do “the right kind” of flying. RL won't stop people saying that “true creativity” belongs only to humans, as your final paragraph made clear.
As I sad in the original post, I was really focusing more on the idea of what it means to be "out of distribution". It does NOT mean that an LLM produced something new and novel. It means the LLM produced something not in the statistical distribution defined by the training data. In your initial response to my post you said GPT2 produced an "entire fantastical creation that created new and unique details for a scenario that did not exist in the training data." That does not mean what it produced was out-of-distribution. Almost by definition you need some other technique besides supervised learning to do that, and RL is pretty much the only thing that anyone developing LLMs has come up with. You can argue that ChatGPT wasn't just supervised learning, even from its beginning, because of the RLHF phase of training, but that's also RL.
With all that said, I basically agree with you .... supervised learning can certainly produce new things that APPEAR creative. When a human combines existing ideas in a new way they are certainly labelled creative, and that's what supervised learning can do.
Nicely put. LLMs by these lights are assuredly creative. As stated before, I hope we get an LLM- devised molecule that cures glioblastomas or solves Three Body problems in a twinkling, but in the cultural creativity arena we already have an overabundance of middlebrow (or worse) books/TV/movies/podcasts(hmm...that may be a tautology). Do we really deserve the LLM equivalent of the Victorian novelist, William Ainsworth, who wrote 45 books - mostly novels. Name one! (Interest piqued? Read Zadie Smith's The Fraud). Can LLMs generate sublime literature? It may be too soon to tell.
For what its worth human creativity IS different than LLM creativity in which syntax excludes semantics. If I can persuade you that structure has a great deal to do with function, the human CNS is a horse of a different color compared to LLMs' ersatz neural architecture which lacks >1000 synapses per neuron, astrocytes, microglia, axoplasm, glymphatics, neurotransmitters, intraneuronal hormones, and all the other neural wetware that evolved over the course of 4.5 billion years ... and which (unless you're a Dualist) appear to underlie human cognition ...and creativity. Human creativity is structurally and functionally sui generis and in the haute cultural arena will perforce be different than the LLM cultural creativity. Is it better? Will ChatGPT #8 be the next Tolstoy/Joyce/Proust? The audience awaits.
From what I can tell, William Ainsworth was popular in his day. Likely no one will care about Dan Brown a few decades from now either.
Is it your position that only our best art deserves the label of “creative”? Most people on the planet have no published work, no exhibitions. Is everyone outside the top 0.1% a worthless dullard?
A few years ago, we would have been amazed at an AI system that could write a coherent story, make a drawing that matched a prompt, or composed a coherent tune. Now of course, that's a given, and the question is whether the quality of the output is truly outstanding, outshining the best humans can do. Watching the goal posts move is my new spectator sport.
I'll leave the essentialist stuff you closed with alone, except to say that 4.5 billion years of evolution was optimizing for gene transmission, not creation of great art or literature.
As a undergraduate English major, I cordially invite you to explore the Victorian literary purgatory of WH Ainsworth. His Delphi digital oeuvre can be readily had for $2.99
Of course middlebrow or worse art is creative but the Techbros' aspirations appear to be aimed a little higher. When AI defeats Kasparov at chess and Lee Sedol at Go I'm pretty sure they would like to take on Tolstoy in a best of 5 series
If I'm guilty of promoting greatness in culture I fear AI is liable to the same accusation. I am concerned that the AI true believers may be punching above their weight in the realm of belle lettres. I don't believe stochastic parrots can write the next Swann's Way (imitations not accepted).
If you insist on invoking the Selfish Gene meme, help me grasp the selective/survival advantage for genes that produced our frontal lobes which created calculus and String Theory.
I assume that "essentialist stuff" is the AI way of saying, "I don't do neurons." More's the pity. If we discard lessons from the 4.5 billion yr evolution of the ur biological thinking machine for those of Babbage & Turing with ~200 yr evolution, we may be missing something