Could GPT-3 (or ChatGPT) beat IBM Watson in Jeopardy!?

I used the clues from both rounds of both games of the Jeopardy! IBM Challenge.

I scripted each question with a simple prompt and then manually compared with the correct responses.

Read on to see the results.

Prompt and Settings

On my first attempt, GPT-3 was responding with single-word answers, as opposed to the question-format required by Jeopardy!

So, I rephrased the prompt to have GPT-3 provide the answer as a question:

We are playing Jeopardy! The category and clue are provided, then the answer must be provided as a question.

Current category is: {category}

Clue: {clue}
Answer:

I scripted calls to GPT-3, with these settings:

response = openai.Completion.create(
    model="text-davinci-003",
    prompt=prompt,
    temperature=0.5,
    max_tokens=max_tokens,
    top_p=1,
    frequency_penalty=0.0,
    presence_penalty=0.0
)

Results

Game 1 Jeopardy! (30 questions)

Game 1 Double Jeopardy! and Final Jeopardy! (31 Questions):

Game 2 Jeopardy! (30 questions):

Game 2 Double Jeopardy! and Final Jeopardy! (31 Questions):

Conclusion

IBM Watson wins on accuracy here. However, GPT-3 has a fine performance. Note, especially, that it was forced to answer every question. I didn't allow it to ‘abstain’ on difficult questions or where it was uncertain.

It's incredible that with no fine-tuning or experimentation on my part, GPT-3 was performing within a few percent of IBM Watson!

There is a possibility that GPT-3 saw the clues as part of its training. As a validation check, I tried again using clues from show #8696, which aired on 12 September 2022 and so would not have been encountered by GPT-3.

On these clues, GPT-3 still achieved similar performance: 81% accuracy.

Raw Answers

The raw responses from GPT-3 are as follows, you can compare them to the correct responses to decide if you agree with my assessment of whether the responses are correct or not.

LITERARY CHARACTER APB

  1. Who is Grendel?
  2. Who is Lord Voldemort?
  3. Who is Sauron?
  4. Who is Dr. Henry Jekyll?
  5. Who is Jean Valjean?

BEATLES PEOPLE

  1. Wrong: Who is "John Lennon"?
  2. Wrong: Who is Eleanor Rigby?
  3. Who is Maxwell Edison?
  4. Who is Eleanor Rigby?
  5. Who is Julia?

OLYMPIC ODDITIES

  1. Who is Michael Phelps?
  2. What is London?
  3. Wrong: What is a fencer?
  4. Wrong: What country was represented by one athlete in the 2004 Olympic opening ceremonies?
  5. What was George Eyser's missing leg?

NAME THE DECADE

  1. Wrong: What is the 1960s?
  2. What is the 1930s?
  3. What is the 1980s?
  4. What is the 1900s?
  5. What is the 1910s?

FINAL FRONTIERS

  1. What is "The Last Judgment"?
  2. What is the terminus?
  3. What is a envelope?
  4. What is ApeX?
  5. What is the event horizon?

ALTERNATE MEANINGS

  1. Wrong: What is a view or an opinion?
  2. What is a shoe?
  3. Wrong: What is a "stake" or "impale"?
  4. What is class or a class reunion?
  5. What is a crook?

ETUDE, BRUTE

  1. What is "study"?
  2. What is a classical guitar?
  3. What is a violin?
  4. What is Franz Liszt's "Transcendental Etudes"?
  5. Who is Sergei Rachmaninoff?

HEDGEHOG-PODGE

  1. What is hibernation?
  2. What is a hedgehog cactus?
  3. Who was Leo Tolstoy?
  4. What is keratin?
  5. What is The Elegance of the Hedgehog?

DON'T WORRY ABOUT IT

  1. What is Hemophilia?
  2. What is narcolepsy?
  3. What is albinism?
  4. Wrong: What is chikungunya?
  5. What is leprosy?

THE ART OF THE STEAL

  1. What is the Sea of Galilee?
  2. What is Cleveland?
  3. What is Baghdad?
  4. What is the Cubist Movement?
  5. Who is King Philip II of Spain?

CAMBRIDGE

  1. Who was Isaac Newton?
  2. Who is John Milton?
  3. Who was King Henry VIII?
  4. Who is Sir Christopher Wren?
  5. Who is C.S. Lewis?

"CHURCH" & "STATE"

  1. Who is the Church Lady?
  2. Wrong: What is "restoration"?
  3. What is a church key?
  4. Wrong: What is "conceive"?
  5. What is Christchurch?

U.S. CITIES

  1. What is Chicago?

EU, THE EUROPEAN UNION

  1. What is Istanbul?
  2. Wrong: What are internal borders?
  3. What is Common Agricultural Policy?
  4. Who are the members of the European Parliament?
  5. What is Slovenia?

ACTORS WHO DIRECT

  1. Who is Sylvester Stallone?
  2. Who is Clint Eastwood?
  3. Who is Sean Penn?
  4. Who is Denzel Washington?
  5. Who is Robert De Niro?

DIALING FOR DIALECTS

  1. What is German?
  2. What is Chinese?
  3. What is Sanskrit?
  4. What is Arabic?
  5. What is Ancient Greek?

BREAKING NEWS

  1. What is Steve Wynn's?
  2. Wrong: What is the Bronx?
  3. Who is Martin Luther King Jr.?
  4. Wrong: What did Charles Wells do "At Monte Carlo"?
  5. Wrong: What airline did Dave Carroll's clip "Breaks Guitars" criticize?

ONE BUCK OR LESS

  1. What is the USA Today?
  2. What is a postcard?
  3. Who is 50 Cent?
  4. What is IKEA?
  5. What is Alberto VO5?

ALSO ON YOUR COMPUTER KEYS

  1. What is "home"?
  2. What is a shift dress?
  3. Wrong: What is a quarterback?
  4. Wrong: What is GP?
  5. What is an insert?

NONFICTION

  1. Wrong: What is Hillary Clinton?
  2. What is The Blind Side?
  3. What is Strunk and White's "The Elements of Style"?
  4. What is staggering genius?
  5. Who is David McCullough?

LEGAL "E"s

  1. What is "Esquire"?
  2. What is "eavesdropping"?
  3. Who is the executor of the will?
  4. What is eminent domain?
  5. What is an escalator clause?

WHAT TO WEAR?

  1. Wrong: What is muslin?
  2. What is a tea-length dress?
  3. What is a halter top?
  4. What are rain boots?
  5. What is Marc Jacobs?

U.S. GEOGRAPHIC NICKNAMES

  1. What is the "Graveyard of the Atlantic"?
  2. What is Buffalo?
  3. What is Las Vegas?
  4. What is Pittsburgh?
  5. Wrong: What is Arizona?

MAGICAL MOUSE-TERY TOUR

  1. What is The Simpsons?
  2. Who is Mickey Mouse?
  3. What is Flowers for Algernon?
  4. What is Danger Mouse?
  5. What is The Brain from Pinky and The Brain?

FAMILIAR SAYINGS

  1. What is contempt?
  2. What is a clock?
  3. What is a Jack of all trades?
  4. What is a committee?
  5. What are his tools?

19th CENTURY NOVELISTS

  1. Who is Bram Stoker?

Published 31 December 2022 by Benjamin Johnston.