
Blocks world models
"Isn't it amazing that this was a working system in 1972?
Isn't it amazing how AI doesn't seem to have gotten much farther since then?"
SHRDLU (first released in 1968) was an early AI program to parse natural language text commands and manipulate a collection of virtual blocks. It ran on a PDP-6 and would render the interaction on an
attached DEC-340 display.
SHRDLU was an impressive demo at the time and helped spark interest in what is now known as "Good Old-Fashioned AI". Below is a demo (with both scripted and interactive modes) for SHRDLU with different backends.
Backends
TWDEMO
The above animation will replay TWDEMO, based on the
DEMO.FLICK file with a few hacks to make the output look more like the output of TWDEMO when run on a PDP-10 emulator.
CLISP
A
student project at Missouri University of Science and Technology translated the available source (in MACLISP) to Common Lisp. The above "CLISP" scripted mode was made with
the Windows console version plus some minor crash fixes.
The interactive mode uses the same source, but using
Web Embeddable Common Lisp to run it inside the browser. It seems somewhat fragile, so you may need to do an "Empty Cache and Hard Reload" if it breaks.
GPT-*
To see how modern AI compares, we evaluate a few GPT models on the SHRDLU environment. The model is given a system prompt and tools to perform the same actions that SHRDLU performs (MOVETO, GRASP, UNGRASP). We evaluate the following models:
- gpt-5-nano: reasoning_effort=medium (default)
- gpt-5.1: reasoning_effort=medium
- gpt-5-pro: reasoning_effort=high (default)
Note that the model views the world state directly (in JSON format) and does not look at a picture of the world state (similar to the SHRDLU program). The model tends to refer to objects by their names, not just their colors, as the names are provided in the world state and it was not instructed to ignore the names.
An interactive mode is available for this as well if you provide an OpenAI API key, but keep in mind that gpt-5-pro is expensive and slow to run on this environment. This page is a static webpage and your API key is sent directly to the OpenAI API server via client-side javascript.
The evaluations did not provide image inputs in addition to the JSON world state, but these can be enabled with the button "INCLUDE IMAGES". Reasoning output can be shown with the "SHOW REASONING" button, provided your OpenAI API organization is verified.
Evaluation
Each backend was run with the inputs from
Winograd's SHRDLU website. Some questions require that previous questions/commands were completed successfully, otherwise they don't make sense. Even if each command is executed successfully, the world state can still diverge between the different recordings.
The eval scoring is a bit ad-hoc due to the divergence (and only run once), but it has a max of 41 points for responding to every request correctly.
| backend | score | cost | processing time |
| twdemo | 40 | unknown | unknown |
| clisp | 13 | ~0 | ~0 |
| gpt-5-nano | 13 | $0.16 | 24m |
| gpt-5.1 | 32 | $15.69 | 1h14m |
| gpt-5-pro | 38 | $94.96 | 5h26m |
clisp seems to have some issues (likely due to the version of the source it was ported from) and isn't particularly close to twdemo.
gpt-5-pro did pretty well, but at high cost and latency.
History
According to
an interview with Terry Winograd (creator of SHRDLU), researchers would make a cool but fragile demo that in theory could then be extended and iteratively improved until it was robust. In practice this second part turned out to be much harder than anticipated.
A few years after SHRDLU was released, the
Lighthill report was published and AI winter arrived in the UK. SHRDLU is mentioned in the report, with the conclusion that "Extension of the methods used to a much wider universe of discourse would be opposed violently by the combinatorial explosion." The symbolic approach to building AI declined in popularity after this.
The name SHRDLU is a reference to
"ETAOIN SHRDLU", the equivalent of "QWERTYUIOP" on the typesetting machines of the day. Coincidentally,
a sci-fi short story named "ETAOIN SHRDLU" was published in 1942 describing an artifically intelligent typesetting machine that demands humans train it on text data, making it more powerful.
Credits
Thanks to Eric Peterson for reviewing this article and Eric Swenson for answering questions about TWDEMO.
Additional References