021 - The 12 Billiard Balls Puzzle: From Childhood Challenge to AI Benchmark

Reflecting on the evolution of problem-solving from human minds to artificial intelligence

Sep 13, 2024

A puzzle that captivated me as a child now serves as a benchmark for testing AI capabilities. As I revisit this challenge, I'm fascinated by how it reveals the strengths and limitations of both human and artificial intelligence. What does our approach to such problems tell us about the nature of thinking and creativity? By comparing my childhood struggle with the puzzle to current AI attempts, I'm exploring larger questions about the future of problem-solving and the unique value of human cognition in an increasingly AI-driven world.

At the age of 12, I encountered a puzzle that would captivate my imagination for months. Little did I know that decades later, this same puzzle would become a benchmark for testing the limits of artificial intelligence.

Ethan Mollick, a professor at Wharton, introduced the concept of an 'impossibility list' for AI—tasks that seem just beyond the reach of current AI capabilities. He likens it to the advent of digital cameras. For years, their resolution hovered just below that of Polaroid photos, leading many to dismiss their potential. Then suddenly, that threshold was crossed, and digital cameras took off.

Today, we might be witnessing a similar moment in AI's ability to solve complex reasoning problems. The puzzle that once challenged me now serves as an 'impossibility test' for AI models, pushing the boundaries of their logical reasoning capabilities.

The Puzzle and Its Challenge

The puzzle goes like this:

You have 12 billiard balls and a balance scale. All balls are outwardly identical, but one is slightly heavier or lighter. You are allowed to use the scale only three times. Can you always find the odd ball and determine whether it is lighter or heavier?

Caption: A DALL-e representation of me as a 12 year old boy pondering this puzzle.

A relative from the USA challenged me: "If you finish this riddle within a year and send me the correct answer, I will give you 100 Dutch guilders." The promise of reward and recognition from this distant relative was irresistible, but I had no idea of the journey ahead.

A Journey of Discovery

My first response to my relative was to look for flaws in the problem statement. "Can I just feel the balls and tell you which one is heavier or lighter?" That was met with a smile and a shake of the head, that wasn’t the way - I actually had to start thinking.

Eager to solve it quickly, I began with simple approaches, like weighing six balls on each side. But frustration set in as I realized these methods wouldn't work within the three-weighing limit. For months, I alternated between intense focus on the problem and periods of not thinking about it at all. This cycle of engagement and disengagement mirrors the feedback loops we often see in complex systems—periods of intense activity followed by seeming inactivity, but with underlying processes still at work. Whether in our own minds or in AI systems, these apparent lulls often mask important background processing.

One night in bed, on our family summer holiday, after what felt like hours of contemplation, twelve year old Roderik had an insight! Eureka! And behold, this 'what if' moment led to a solution that actually worked! Excited and afraid I’d forget by morning, I ran to my parents' bedroom in the middle of the night to explain my solution.

That moment of revelation, with its mixture of excitement and urgency, has stayed with me. It's a vivid reminder of how breakthroughs often come not in a steady stream, but in sudden bursts after periods of apparent stagnation—a pattern we see repeated in many complex systems, including the development of AI.

AI Takes on the Challenge

Fast forward to the present day. The advent of generative language models like ChatGPT reminded me of this childhood puzzle. Inspired by Ethan Mollick's challenge to think of problems that AI couldn't solve yet, I decided to test these models with my old nemesis—the 12 billiard balls puzzle.

I tried the puzzle with two of the most advanced AI models available: Claude 3.5 Sonnet and ChatGPT 4o. To my surprise, both models struggled with the puzzle in ways reminiscent of my own early attempts.

Caption: A snippet of Claude 3.5 Sonnet's attempt at solving the puzzle, showing logical steps but ultimately requiring four weightings.

Caption: Part of the output of ChatGPT 4o’s solution, in this part for instance the assumption is made that Group 1 is heavier than Group 2 and it cannot deduce that one side tipping could mean the ball is either lighter or heavier.

Caption: Part of the output of ChatGPT 4o's solution, remaining very confident indeed that the solution is actually correct.

Both Claude 3.5 Sonnet and ChatGPT 4o began their solutions logically but soon fell into the same traps I did as a child. Their attempts quickly required four weightings instead of three, highlighting the difficulty of maintaining consistent reasoning across multiple steps.

This experience highlighted several key limitations in the AI models' reasoning capabilities, from difficulty maintaining consistent logic to challenges in handling scenarios where the odd ball could be either heavier or lighter. It served as a stark reminder that even our most advanced AI systems can struggle with tasks requiring multi-step logical reasoning and the ability to hold multiple scenarios in mind simultaneously.

Which is, as I described in an earlier newsletter, exactly how these models work, they race forward one word at a time, while not being able to strategize.

The o1 Model: A Step Forward

Recognizing these limitations, OpenAI developed the o1 model, designed specifically for complex reasoning tasks. This model aims to spend more time "thinking" before responding, much like a human would when tackling a difficult problem.

When presented with the billiard balls puzzle, the o1 model demonstrated a more structured approach:

Caption: o1 model's approach to the puzzle, showing a more structured and strategic thinking process.

The o1 model demonstrated a more strategic approach, carefully planning ahead and considering multiple scenarios. The solution it came up with was /almost/ correct, everything worked until it went into the more tricky edge cases. Like its predecessors though, it struggled with flexibility when reassessing its approach. Even AI's most advanced models have room to grow.

The o1 model shows clear progress in AI reasoning but - for now - also reveals persistent challenges—like its difficulty in revisiting strategies when faced with unexpected complexities. This experience reminds me of my own breakthrough with the puzzle as a child: sometimes, it's the smallest insight that can shift an entire system's trajectory.

Reflections on Rapid Progress

Earlier this year, while giving a talk on the history of technological change, I shared an anecdote about my 95-year-old grandmother. Her life spanned incredible technological advancements—from the invention of the telephone to vaccines and intercontinental travel. She often remarked how these changes were both exciting and terrifying, reshaping the world in ways she never imagined.

At the time of that talk, I was convinced that AI would remain stuck for a while—great at recalling facts or summarizing text, but inherently incapable of true reasoning. How easy it is to be proven wrong! The speed at which AI models like o1 have developed reasoning abilities echoes my grandmother’s sentiment. It’s both thrilling and unnerving to witness AI’s rapid progress. The o1 model, while imperfect, represents a leap forward, demonstrating how AI is starting to move beyond simple tasks to approach more complex, human-like problem-solving.

A Call to Action

The rapid advancement of AI, exemplified by models like o1, has far-reaching implications for society. Just as the inventions my grandmother witnessed reshaped the world, AI has the potential to transform how we work, learn, and solve problems.

As we navigate this rapidly evolving landscape, I encourage you to reflect on your own challenges—whether in work, life, or society. What problems seem unsolvable today? As AI continues to grow in its capabilities, how might these tools help you unlock new possibilities and solve problems you once thought impossible?

But this progress must be inclusive. As we make strides in AI, we must also ask how we can ensure these advancements benefit society as a whole, not just a privileged few. AI should not just be a tool for the powerful, but a democratized force that uplifts communities, solves global challenges, and drives positive change.

The excitement and wonder I felt as a child, finally solving that billiard ball puzzle, still resonates with me today. As I watch AI begin to tackle similar puzzles, I am reminded of the potential for breakthroughs. May we, like the curious child solving that riddle, approach the challenges and opportunities of AI with the same spirit of perseverance—knowing that the next breakthrough might be just one small insight away.

Convergent Curiosities

Discussion about this post