By now, ChatGPT’s work-easing capabilities are no secret, with people using it to perform a host of job tasks, from writing emails to debugging programming code. But we authors—with our backgrounds in software, speech-language pathology, and special education—wondered if the tool’s much-hyped artificial intelligence (AI) could also be used clinically, to produce language activities for speech sessions.
Of relevance to speech-language pathologists, ChatGPT uses massive datasets to teach itself concepts, recognize relationships, and develop sophisticated answers to questions (see sources). Developed by the company Open AI and (to date) available for free, the tool can, for instance, differentiate the two meanings of the word “bark” based on the context of a query.
To see if its capacity extends to therapy materials, we tested its performance developing language activities for Maria, a middle-schooler who has difficulty differentiating “sk” from “sh” (saying “shirt” as “skirt”). Quickly summarizing our findings: Yes, ChatGPT can indeed create applicable materials for a student like Maria—but only when an SLP’s expertise and use of smarter prompts drives their creation.
Let’s take a closer look.
We began our quest to develop Maria’s therapy materials by feeding ChatGPT the following prompt: Create a three-paragraph story, about a girl named Maria, using a high percentage of words that start with “sh” and “sk.”
Here’s an excerpt from the response:
Maria, the adventurous skier, set out on a shimmer [sic], snow-covered slope. She skillfully skidded down the mountain, her skis slicing through the powdery snow. The sun shone brightly, casting long shadows on the pristine landscape as she swooshed and swayed, showcasing her sharp turns and swift movements. She felt a rush of exhilaration as she approached a steep descent, her heart pounding with anticipation.
What worked …
SLPs could use this generated story in several ways:
Ask the student to read the story aloud as a baseline or a probe.
Have the student underline the “sk’’ and “sh” clusters in words and then read the story aloud while attending to the letter shapes.
Dictate a paragraph from the story and ask the student to write while attending to the target letter shapes.
To introduce the story to Maria, we underlined the targeted sounds of “sk” and “sh” clusters in words. Maria said, “Ah, this is so cool! I’ve never skied before.” She enjoyed reading a story about herself and the subject matter. We could easily have generated another story on a different topic that interests her (such as puppies or basketball) by rephrasing the prompt and hitting the “regenerate” button.
… and what didn’t work
We immediately noticed ChatGPT had mistakenly used the adjective “shimmer,” instead of “shimmering,” to describe the slope. Accordingly, we corrected the error before our session with Maria. There’s a dance between what you give to ChatGPT and what it gives you. You will likely need to edit the output.
When we shared the new story with Maria, she struggled to read it out loud, so we determined the reading level was too challenging. Our next move was rephrasing the prompt to include “a middle-school child”—to test if this added condition would filter the reading level. We now asked for a three-paragraph story for a middle-school child who is having trouble distinguishing SK from SH while reading.
Once upon a time in the small town of Skillington, there lived a young boy named Sam. Sam was a cheerful and curious middle schooler, but he had a unique challenge. He often found it tricky to tell the difference between words that started with “sk” and “sh” when he read. This made reading a bit confusing for him, and sometimes [he] mixed up words like “skin” and “shin,” which led to funny misunderstandings with his friends. Sam practiced saying words with “sk” and “sh,” improved his skills, and was soon well on his way to becoming a reading champion of Skillington Middle School.
This result was also problematic because it makes clinical assumptions about reading. It indicates that if Sam has difficulty distinguishing between “sk” and “sh” in the text, he’s having a hard time producing those sounds. We didn’t want to suggest that to Maria. Here is where clinical expertise is critical to constructing skillful prompts and interpreting results.
Further testing the tool’s potential applications and limitations for Maria’s treatment, we tried another prompt. We knew that ChatGPT can generate words according to number of syllables; placement of sounds in initial, medial, and final positions; and specific shapes (such as consonant-vowel-consonant-consonant). So we asked for 10 words that have SH or SK in the medial position in the word.
The response: “Ashtray, dishware, taskbar, mosquito, whisker, risky, briskly, mushroom, mistake, whiskey.”
Although this list technically satisfies the criteria, there are notable issues—among them that some words are inappropriate, obscure, or contain alternate spellings of the target sound. Here is a working list of this and other limitations we identified, along with some suggestions for addressing them to make materials usable:
Limitation: Developmentally inappropriate semantics. Words such as “whiskey” and “ashtray” are clearly not appropriate for children. Suggested adjustment: Use SLP experience to improve the prompt. Construct a prompt that contains more specific instructions; for example, “Give me 10 words that have SH or SK in the medial position in the word. Make the words appropriate for children.”
Limitation: Orthography not distinguished from phonology. In the list of words with “sk” and “sh” clusters, “mosquito” contains the “sk” cluster when produced verbally—but it is spelled with “qu,” and is an incorrect teaching word for the student who is not attending to orthographic representations of “s” clusters. Suggested adjustment: Use SLP experience to specify parameters. Include keywords in the prompt to further specify the results, such as “give me 10 words with the ‘sk’ sound that are also spelled with ‘sk.’”
Limitation: Insufficient understanding of language disorders and treatment. ChatGPT results may contain incomplete or inappropriate treatment strategies or implications not rooted in evidence.
This is because the algorithm can’t perform complex reasoning about individual communication disorders due to limitations in the training data (see sources). The technology is still in development regarding communication disorders. Suggested adjustment: Use SLP experience to improve the prompt. With the free version of ChatGPT, provide as much information as possible in the prompt and apply clinical judgment to the output. With the subscription-based ChatGPT-4, you can upload files—such as research studies and developmental norms—and use them to adjust a prompt (just make sure the documents are HIPAA-compliant). An enterprise version of ChatGPT will soon allow organizations to create their own specialized versions of ChatGPT, with customized knowledge. The tool is evolving rapidly. However, to date, naming a specific communication disorder or disability in a prompt does not direct the technology to produce more relevant results.
Another client—14-year-old Anthony, who has cerebral palsy and developmental delay in multiple areas including language—is working on becoming more intelligible, with longer utterances. We asked ChatGPT to generate words of increasing complexity by grade level.
The prompt yielded results that were too complex for the grade levels listed, especially for a child with a developmental or language disorder. For example, it responded with “banana, elephant, butterfly, octopus, giraffe” for kindergarten; for grade 6, it offered “interpretation, extravaganza, superstition, accommodation, responsibility.”
Limitation: Lack of developmental norms. Without subscription-enabled customization, which is not yet available, the current free version of ChatGPT does not appropriately incorporate skill levels and developmental stages into responses.
Suggested adjustment: Incorporate SLP expertise to improve the prompt. Remove the “grade level” parameter from the prompt, which could be too specialized for the training data. For example, generate a list of 10 picturable concrete words of increasing syllable length, from two to five syllables.
A more technical workaround is to install a ChatGPT plug-in via which you can upload a PDF with developmental norms (for speech sound development, for example, or standardized reading levels). Providing this information can help ChatGPT better support you.
Our informal testing indicates that ChatGPT is easy and fun to use and can support SLPs’ clinical work. The tool can generate skill-building therapy materials and save clinicians time and money—if clinicians use clinical judgment, become skilled with creating prompts, and know ChatGPT’s limitations. After all, you—not the tool—are the expert. For a deeper dive on constructing prompts, consider taking courses on prompt development, such as Coursera’s free class in Prompt Engineering (see sources).