A Harry Potter Trivia Bot, Powered by GPT-3

Chris Altonji
4 min readJan 9, 2023

This post is about how I built this Harry Potter Trivia Bot: https://potter.altonji.com.

The Harry Potter Trivia Bot in Action

Although, the GPT-3 model didn’t read the Harry Potter books during training, it did read the internet and the internet loves to talk about Harry Potter. So, it turns out, to write a decent Harry Potter Trivia bot, GPT-3 makes a great backend!

The Front End of the site is a React site hosted with Azure Static Web Apps.

Component Diagram for the app

The Backend is a lightweight Flask app, hosted on Azure App Service, with just two API endpoints. Both are just a thin wrapper around GPT-3.

For the /autocomplete endpoint, I use the GPT-3 prompt:

Complete this list of 5 Harry Potter Trivia Questions.
1. What is at the base of the Whomping Willow?
2. How do first years get from the Hogsmeade train station to the castle?
3. Where is the entrance to the Chamber of Secrets?
4. How is mail delivered in the wizarding world?
5. {{query}}

The cost of each request with this prompt is about .15 cents.

For the /answer endpoint, I use the GPT-3 prompt:

Welcome to Harry Potter Trivia! In this game, we will test your knowledge of the Harry Potter series by J.K. Rowling. Are you ready to answer some challenging questions about Hogwarts, the Wizarding World, and all your favorite characters? Let’s get started!
Q: In Book 4, which two teams compete for the Quidditch World Cup?
A: Bulgaria and Ireland
Q: How is mail delivered in the wizarding world?
A: Owl
Q: What do the abbreviations for the two main wizarding exams, O.W.L.s and N.E.W.T.s, stand for?
A: Ordinary Wizarding Level and Nastily Exhausting Wizarding Test
Q: {{query}}?

The cost for each request with this prompt is about .25 cents.

I didn’t use any empirical methods to come up with these prompts. I just played around until I started seeing it perform pretty well on Harry Potter questions, and pretty poorly on real world questions (I didn’t want it to be capable of general purpose question answering).

Extractive Question Answering

I actually previously wrote this project using the extractive question answering method to search for an exact text answer from the first book. You can find that project deployed at https://extractive-potter.altonji.com.

The Extractive Method in Action

You can read more about the extractive question answering method in the original paper: Reading Wikipedia to Answer Open-Domain Questions.

To make the site, I downloaded a copy of the first book, and split it into paragraphs of max 256 words.

Given a user inputted question the api follows these steps:

  1. Find the top 5 paragraphs based on overlapping terms between the paragraph and the question using the BM-25 ranking algorithm.
  2. Try to find the answer to the user’s question within each candidate paragraph using a RoBERTa model fine tuned on the SQUADv2 dataset.
  3. Choose the answer that the Machine Learning model gives the highest confidence score to.

Most of the deployment process is the same, except in addition, I deploy the deepset/roberta-base-squad2 model to Amazon Sagemaker on CPU on their Serverless Inference option. This is cheap, it costs around 0.08 cents per question, and it’s fast enough, about 500ms latency once warmed up but suffers from terrible cold start latency.

Extractive vs Generative

The extractive search process is fully grounded in the books which creates a fun way to experience the books themselves. However, on a small test set of questions that I wrote, the extractive method performs worse than the generative approach.

Results of Generative and Extractive methods on a small test set

In addition to being less accurate, the extractive method was a fair amount more work to implement. It took me about 8 hours to implement the generative APIs. It took around 50 hours to implement the extractive API. Also, the extractive approach is clunkier to deploy because it requires keeping the full text of the book in memory, and deploying a dedicated model.

To be fair, there are a lot of ways that the extractive method could be improved. For example, I could fine tune the base model on the harry potter books themselves before fine tuning on the Squad task. I could also fine tune the final model on a set of harry potter questions. Or I could throw out the Squad task entirely and look at models trained on Narrative QA. But all of this feels like a fair amount of work when the generative version is already doing well enough to impress my friends.

One friends reply after playing with https://potter.altonji.com

--

--