SHRDLU - Affine Layer

Blocks world models

Written by Christopher Hesse — December 11^th, 2025

"Isn't it amazing that this was a working system in 1972? Isn't it amazing how AI doesn't seem to have gotten much farther since then?"

— Anonymous employee of Semaphore Corporation — ~November, 1997

SHRDLU (first released in 1968) was an early AI program to parse natural language text commands and manipulate a collection of virtual blocks. It ran on a PDP-6 and would render the interaction on an attached DEC-340 display.

SHRDLU was an impressive demo at the time and helped spark interest in what is now known as "Good Old-Fashioned AI". Below is a demo (with both scripted and interactive modes) for SHRDLU with different backends.

Backends

TWDEMO

TWDEMO is a program that produces the demo shown in this video and still works within a PDP-10 emulator. However, it is a scripted demo with pre-recorded dialog and block movement commands. The demo was created based on actual outputs from SHRDLU but the available SHRDLU source only partially reproduces the demo. If there was a single version of the source code that reproduces the demo, it seems hard to find today.

The above animation will replay TWDEMO, based on the DEMO.FLICK file with a few hacks to make the output look more like the output of TWDEMO when run on a PDP-10 emulator.

CLISP

A student project at Missouri University of Science and Technology translated the available source (in MACLISP) to Common Lisp. The above "CLISP" scripted mode was made with the Windows console version plus some minor crash fixes.

The interactive mode uses the same source, but using Web Embeddable Common Lisp to run it inside the browser. It seems somewhat fragile, so you may need to do an "Empty Cache and Hard Reload" if it breaks.

GPT-*

To see how modern AI compares, we evaluate a few GPT models on the SHRDLU environment. The model is given a system prompt and tools to perform the same actions that SHRDLU performs (MOVETO, GRASP, UNGRASP). We evaluate the following models:

gpt-5-nano: reasoning_effort=medium (default)
gpt-5.1: reasoning_effort=medium
gpt-5-pro: reasoning_effort=high (default)

Note that the model views the world state directly (in JSON format) and does not look at a picture of the world state (similar to the SHRDLU program). The model tends to refer to objects by their names, not just their colors, as the names are provided in the world state and it was not instructed to ignore the names.

An interactive mode is available for this as well if you provide an OpenAI API key, but keep in mind that gpt-5-pro is expensive and slow to run on this environment. This page is a static webpage and your API key is sent directly to the OpenAI API server via client-side javascript.

The evaluations did not provide image inputs in addition to the JSON world state, but these can be enabled with the button "INCLUDE IMAGES". Reasoning output can be shown with the "SHOW REASONING" button, provided your OpenAI API organization is verified.

Evaluation

Each backend was run with the inputs from Winograd's SHRDLU website. Some questions require that previous questions/commands were completed successfully, otherwise they don't make sense. Even if each command is executed successfully, the world state can still diverge between the different recordings.

The eval scoring is a bit ad-hoc due to the divergence (and only run once), but it has a max of 41 points for responding to every request correctly.

backend	score	cost	processing time
twdemo	40	unknown	unknown
clisp	13	~0	~0
gpt-5-nano	13	$0.16	24m
gpt-5.1	32	$15.69	1h14m
gpt-5-pro	38	$94.96	5h26m

clisp seems to have some issues (likely due to the version of the source it was ported from) and isn't particularly close to twdemo.

gpt-5-pro did pretty well, but at high cost and latency.

History

(source)

According to an interview with Terry Winograd (creator of SHRDLU), researchers would make a cool but fragile demo that in theory could then be extended and iteratively improved until it was robust. In practice this second part turned out to be much harder than anticipated.

A few years after SHRDLU was released, the Lighthill report was published and AI winter arrived in the UK. SHRDLU is mentioned in the report, with the conclusion that "Extension of the methods used to a much wider universe of discourse would be opposed violently by the combinatorial explosion." The symbolic approach to building AI declined in popularity after this.

(source)

The name SHRDLU is a reference to "ETAOIN SHRDLU", the equivalent of "QWERTYUIOP" on the typesetting machines of the day. Coincidentally, a sci-fi short story named "ETAOIN SHRDLU" was published in 1942 describing an artifically intelligent typesetting machine that demands humans train it on text data, making it more powerful.

Credits

Thanks to Eric Peterson for reviewing this article and Eric Swenson for answering questions about TWDEMO.

Additional References

High level description of SHRDLU
Terry Winograd's Thesis about SHRDLU
List of recent work related to restoring SHRDLU
MIT AI videos (at least 04 and 26 contain SHRDLU content)
PDP-10 emulator + software that can run SHRDLU demo
DEC-340 page with link to manual
5x7 Monospaced Pixel Font used in the above display
An earlier experiment to see if ChatGPT can match SHRDLU in capability

Subscribe to updates via email