Is it true that AI can diagnose autism?

Short answer: No.

Slightly longer and probably more accurate answer: Not yet.

There’s been some buzz lately about articles with headlines like “AI Can Diagnose Childhood Autism From Eye Photos With ‘100% Accuracy’” and…well, no, it can’t.  I’m going to explain why I believe the research should never have passed peer review. The research is flawed and the reporting is irresponsible. There is potential for AI to eventually learn to diagnose autism, but that’s not what’s happening in this study.

What Did The Study Try to Do?

The study was trying to teach an AI Deep Ensemble Model how to diagnose autism through retinal scans—images of the inside of the eye. The researchers had the AI model look at scans of 479 autistic children and adolescents and 479 typically developing children who were matched for things like age, sex, and socioeconomic background.

A Deep Ensemble Model means they had lots of individual AI looking at the same thing. You can think of it like a tech version of the old parable about blind men touching an elephant. In the story, one man holds the tail and says an elephant is like a rope. Another touches the leg and says an elephant is like a tree. Yet another touches the trunk and says it’s like a snake while another touches a tusk and says it is like a spear. They’re all right, but each only has a piece of the answer.

The Deep Ensemble does something similar, each AI looking at the retinal scans from a different perspective, then all the AI “talking” to each other about what they’ve seen, correcting one another, and narrowing in on understanding the “full elephant” of the images they are analyzing. Deep Ensemble is meant to provide a more accurate result because the computers are checking each other’s work along the way.

The result of their study was…well, in my opinion, too good to be true. The AI model correctly identified 100% of the subjects who had been diagnosed autistic and 100% of the subjects who had been identified as typically developing. But I’m going to explain to you why I believe the AI was not learning how to identify autism but rather how to replicate the biases of the medical system that had diagnosed/not diagnosed those 958 subjects.

Was Their Objective Plausible?

First off, though, I want to say that I don’t think the researchers were off the track. I do believe that, eventually, retinal scans could be used to identify autism. It is already possible for doctors and researchers to see Alzheimer’s and Parkinson’s in retinal scans. 

The retina is the “back wall” of the inside of the eye. You probably learned about it in school with a metaphor like it being the movie screen onto which the eye projects the light we see. In the center of the retina is a spot called the optic disc, but you probably know it by the more common name of “blind spot” if you’re aware of it at all. Our brains cover up that blind spot in a typical eye by creating a sense of seeing the things that we just saw there. Our eyes move around, and the blind spot moves around. Our brains convince us we see everything, and cover up our perception of the blind spot in our visual field.

That optic disc blind spot is the place where nerves leave our eye and connect with our brain to send signals from the retina to the visual cortex located at the back of our head. And the nerves that make up the optic disc are older than our brain! The optic disc develops out of the neural tube in the first months of our life inside the uterus. It is one of the oldest nerve structures in our body, and can show things about the foundations of our brains.

Since autism includes structural and functional differences in the brain when compared to the majority of brains, it is theoretically possible that some of those differences could be observed in a retina scan. And if researchers can figure out how to identify autism in a retinal scan, it could be a good thing for autistic people and our families. It could make diagnosis faster, more accurate, and much more accessible, especially for marginalized people. Even though I believe this particular study is only useful as an example of what not to do, I want researchers to keep trying, because they are looking at ways that AI could become a very helpful assistant in the diagnostic process.

Why Do I Call It a Failed Study?

The moment I saw those headlines trumpeting 100% success, I started looking more closely at the study. Why was I so suspicious? 

The subjects for the study were identified by doctors, researchers, and caregivers studying the observed behavior of children and adolescents. The AI deep ensemble looked at biological evidence. The correlation between observed behavior and biological structures should not have been perfect.

Why? Because human doctors have biases and miss diagnosing some people. I am willing to believe that the AI could have identified the autistic subjects with 100% accuracy, because the researchers cherry-picked their autistic subjects and only used those who scored high on diagnostic tools designed to rank the “severity” of autism. (I put “severe” in quotes because I hate the non-autistic-led campaign of talking about “severe” or “profound autism” and also because I don’t believe the effects of autism on a person’s life is something so clear cut as a false binary of “severe vs not.” Additionally, the subjects’ experience of their own life was never examined—all the diagnostic ranking was done by professionals and caregivers with no input from the autistic children and adolescents themselves.)

For an example of how badly trained professionals can misstep, consider Dr. Tony Attwood, a world-recognized professional in diagnosing and studying autism. His own son is autistic, yet Dr. Attwood did not realize it until his daughter, who works in special education, approached her father, Dr. Attwood, to say that she thought her brother was probably on the spectrum. 

Will Attwood was thirty years old before his famous diagnostician father realized that his son’s lifelong struggles with substance abuse and jail were the result of living with unrecognized autism.  Will Attwood has since published a book about his experiences with late-diagnosed autism and prison, hoping to help other autistic adults who find themselves behind bars. The story of Will Attwood may seem extreme, but it is very common at this phase in our collective understanding of autism for autistic people to slip through the diagnostic cracks. 

If the AI Deep Ensemble Model were actually being trained to recognize autism, it should have discovered at least one autistic person hidden in that cohort of 479 children and adolescents who had been assumed to be typically developing. But, instead, it agreed with the lack of diagnosis for 100% of those subjects.

Why? Because the data was flawed. This is a direct quote from the study: 

“When obtaining retinal photographs of patients with ASD, caregivers accompanied them to ensure comfort and stability. The photography sessions for patients with ASD took place in a space dedicated to their needs, distinct from a general ophthalmology examination room. This space was designed to be warm and welcoming, thus creating a familiar environment for patients. Retinal photographs of typically developing (TD) individuals were obtained in a general ophthalmology examination room.”

A human examining the pictures might not notice that they were taken in different rooms. Or they might see that the images were from different rooms but not pay any attention to that information, focusing instead on the actual retinal images.

But AI notices everything. The fact that all the images of autistic eyes were taken in a different room from the images of those subjects who had not been diagnosed autistic means this study is useless and should never have passed peer review. This is a failed study because the data is flawed.

I don’t know how the AI recognized the room difference, but it is obvious to me that it did, because the results are “too perfect” and the AI is simply learning to replicate the biases of the researchers. It’s as if one of the Deep Ensemble AI went outside the elephant’s enclosure and found a braille book about elephants and was convincing enough to the other AI models to cause them all to shape their observations around that room difference.

Are External Cues a Known Issue With AI?

Yes, absolutely. It is well-known in AI research that studies must control for these sorts of external cues, as they can confound the results of a study. 

One example of external cues comes from a dermatology and AI study conducted by Dr. Roberto Novoa of Stanford University. Dr. Novoa’s early attempts at training AI to recognize cancer had similar results as the current retinal scan-autism study: the AI appeared to be as accurate as doctors…. until Novoa noticed that they were just training the AI to recognize rulers! When a dermatologist thinks a lesion might be cancerous, they photograph it with a ruler—a basic wooden ruler like you probably used in grade school—to indicate size. The AI learned to equate rulers with cancer so the study had to be redesigned.

You might think taking retinal images in a different room is too subtle for AI to pick up on, but the whole point of using an AI Deep Ensemble model is to pick up on cues so subtle a human observer might not notice. This retinal scan study needs to be re-done with all subjects being photographed in the room specially prepared to help autistic subjects stay calm. So long as the photographs came from different rooms (and probably different equipment) this study is not ready to pass peer review or be published.

So there’s the long version of my short answer: No, the AI is not identifying autism…. yet. 

And every reporter out there trumpeting that 100% statistic and declaring a major breakthrough in diagnosing autism has just shown themselves to be untrustworthy science reporters, more interested in sensational headlines than careful, replicable science.

The hand and white sleeve of a doctor with brown skin, who is using a magnifying glass to look into a robot child's eyes. In Cubist style. Generated by DALL-E.
Generated by DALL-E.