Maybe ChatGPT has some pre-frontal cortex problems
A non-serious experiment in applying human psychological tests to AI
People have been complaining that ChatGPT has been degrading with each new version. This sounds like cognitive decline! Let’s administer some tests that might detect incipent dementia.
The clock drawing test (CDT) is simple: draw a clock face, add the numbers, and set the hands to a specific time, usually something like “10 past 11.”
It’s not about creating a masterpiece but capturing what your brain does when asked to translate an abstract concept—time—into something visual.
Patients with hemispheric neglect (hemineglect) will have all twelve numbers on the right. (Or all on the left.) Visual/spatial problems might manifest as numbers inside and outside the clockface. Just doing circles instead of numbers might point to a verbal problem. Pull-to-stimulus responses, or perseveration responses might point to Alzheimer’s.
So what problems does ChatGPT have?
Generate a picture
Prompt: Draw a clockface with the time being 10 after 11.
When I asked it about the time that it saw in the first image, it replied 10:10. Thinking about agentic AI, some sort of AI supervisory process could have identified that it made a mistake and sent it back for another try.
Hmmm…
Beautifully photorealistic, and completely wrong at a fundamental level.
Dumbing down
The gpt-4o-mini isn’t as capable as gpt-4o — it has far fewer parameters and was probably knowledge-distilled from gpt-4o. Here’s what it generated.
I asked Anthropic Claude (the biggest competitor to OpenAI’s ChatGPT) what parts of the brain could be damaged to cause these problems. It said:
The pattern shown here, with preserved number placement but impaired time-setting, often suggests frontal-subcortical pathway involvement more than focal damage to a single region. This is commonly seen in early neurodegenerative conditions or in vascular cognitive impairment.
Programming
gpt-4o was trained on a lot of program code, so it’s not surprising that it knows how to write programs to do things that it itself can’t do.
Prompt: Write a program that outputs an image of a clockface with the time being 10 after 11.
Flawless! I could ask it to put numbers on the dial, but I don’t think it will have any problems with that. No sign of cognitive problems or brain damage there.
Draw using an SVG
The text of an SVG image is a pretty close model of what you might say if you were telling someone what to draw, so this feels like I’m testing the language model on something more akin to language.
Prompt: Create an SVG of a clockface with the time being 10 after 11.
I asked it what the image was showing — considering that same supervisory process — and it said 2:10 (which it wasn’t: it was probably more like 1:15). It then corrected it to something that looks kinda correct for the hand positions:
I asked it “how many notches are there on the outside of the clock, and how many should there be?”
Then it tried to correct it, and produced this:
Using Claude as a psychologist again, it said:
Several neurological conditions could produce this pattern of clock-drawing difficulties, with the primary candidates being:
Early to moderate Alzheimer's Disease (AD)
The simplification of the clock face (using only quarter-hour marks)
Difficulty with proper time setting while maintaining basic clock structure
The preserved ability to improve with practice (as seen across attempts) but retention of core deficits
Frontotemporal Dementia (FTD)
Problems with planning and organization
Difficulty with number placement
Preserved basic geometric abilities (circle drawing)
Vascular Cognitive Impairment
Particularly if there are small vessel disease or strategic infarcts
Can show improvement with practice while maintaining core deficits
Variable performance across cognitive domains
The improvement across attempts while maintaining certain consistent errors is particularly interesting, as it suggests:
Preserved procedural learning
Ongoing executive function difficulties
Intact basic motor skills
Specific rather than global cognitive impairment
Vision model redux
What if we told it to hand-draw something? Ultimately, it’s still an image, but does it give a different answer?
Prompt: Handdraw a picture of a clockface with the time being 10 after 11.
When I ask it what time it says, it replies 2:10, which is again wrong. So it looks like ChatGPT can’t read analogue clocks yet. It just seems to guess it, and usually guesses the same number.
ASCII art
Prompt: Create an ascii art of a clockface with the time being 10 after 11.
12
. .
11 1
. .
10 2
. .
9 3
. .
6
|
|
o
- |
|
I didn’t even try to recover from that!
The o1 model said:
12
/ \
11 / \ 1
/ \
10 --+ +-- 2
| \ (M)
9 | * (centre)
| (H)\
8 --+ +-- 4
\ /
7 \ / 5
\ /
\ /
6
I asked Anthropic Claude to analyse the first image, and it said:
Image 1 shows significant spatial disorganization:
Numbers are placed irregularly around the perimeter
There's no clock face drawn, just floating numbers
The vertical lines at the bottom suggest confusion about the task
Numbers are sequentially ordered but poorly spaced
What could be going on with this patient?
The patient's drawing (Image 1) shows several concerning features that may indicate cognitive impairment:
Poor planning and spatial organization
Inability to properly conceptualize the clock face
Difficulty with visual-spatial representation
Problems with executive functioning in terms of planning and organization
These findings could be consistent with various conditions including:
Early-stage dementia
Mild cognitive impairment
Neurological conditions affecting visuospatial processing
Executive function deficits
Is image 1 what you would expect to see from a patient with a damaged pre-frontal cortex?
The pattern shown in Image 1 is quite consistent with prefrontal cortex (PFC) dysfunction. Here's why:
The prefrontal cortex is crucial for:
Executive functioning
Planning and organization
Working memory
Sequential processing
Abstract thinking
In this clock drawing, we can see specific deficits that align with PFC damage:
Planning Deficits:
While the numbers are in correct sequential order (showing preserved numeric knowledge)
The patient couldn't plan the spatial layout to accommodate all numbers evenly
The vertical lines at the bottom suggest an inability to plan how to complete the task
Organization Problems:
Numbers are floating without a containing circle
Spatial relationships between numbers are irregular
No clear strategy for number placement
Abstract Thinking Difficulties:
The patient seems to understand numbers but struggles with the abstract concept of representing time
Unable to translate the mental image of a clock into a drawing
The vertical lines might indicate confusion about how to represent the clock hands
Preserved Elements:
Numbers are in correct sequence (1-12)
Numbers are legible
Basic understanding of clock components is present
This pattern differs from what we typically see in other conditions:
Posterior parietal damage (which would show more spatial distortion)
Temporal lobe issues (which might show number confusion)
Global dementia (which often shows more comprehensive deficits)
The preservation of numerical sequence while failing at spatial organization and planning is particularly characteristic of prefrontal dysfunction. Would you like me to explain more about how specific PFC regions contribute to these different aspects of clock drawing?
Conclusion
I don’t have a simple and clear conclusion. I just have some thoughts that I can draw (pardon the pun) from this experiment.
AI is now at the point that we can administer psychological tests. That’s a big step.
If you are a psychology student trying to make the switch to AI, you could open up your textbooks and start giving AI psych tests, write it up, and see what happens. Almost anything you try will never have been done before.
AI fails cognitive tests in ways that we can perhaps align with a human cognitive test failure, although it’s not very clear. There are some patterns: it looks a bit like dementia sometimes, and a bit like pre-frontal cortex damage a somewhat more often.
Does that mean we have an AI model for dementia or PFC damage now?
AI is not at the point that we can give it any human task and expect it to be done correctly and reliably. In the same way that giving a task to a patient suffering from early-stage dementia (or with pre-frontal cortex damage) might or might not work, AI might or might not succeed.
Can we build supervision trees out of AI components to get better results? I don’t know. Can we build supervision trees out of patients with early-stage dementia? (I am definitely not going to do that experiment — just think of the application for human research ethics approval for that experiment!)
Maybe we should deliberately try to build a pre-frontal cortex to give AI some kind of executive function, and make it more human.
But maybe that’s a really genuinely terrible idea! What would AI want to do if it had the ability to decide what it wanted? Instead, maybe that’s an avenue for AI governance and legislation: it should be illegal to connect something like a pre-frontal cortex to an AI. Or illegal to try to create something akin to a pre-frontal cortex.
We could train up models specifically to be able to pass the clock drawing test. It would tell us nothing about the state of AI.
When I apply these human-based disease tests and the AI comes up short, we laugh at the silliness of the test. AI isn’t human, so passing or failing a psych test is irrelevant. So why do we look at AI milestones (“can AI pass this graduate level maths test?”) — which are essentially psychometric tests — and ask whether we have achieved AGI yet?