Hidden Dangers: Hallucinations and Bias in Generative AI

Posted on 08/19/2024 08:50 AM

Generative AI is a type of artificial intelligence that generates new content—text, video, or audio—by analyzing vast amounts of data from the internet, reorganizing it, and presenting it in a novel form to users. The process involves models "learning" from data to make probabilistic predictions about which words or components should be combined in response to a given prompt. For example, when asked to complete the phrase "It’s raining cats and...", Microsoft Copilot initially suggested "dogs" due to the well-known idiom "It's raining cats and dogs." When prompted again, it offered "buckets," associated with another common expression, “It’s raining buckets."

Text generating AI models include ChatGPT by Open AI, Copilot by Microsoft, and Gemini by Google. ChatGPT can be accessed with a TAMU email account. Copilot can be accessed by TAMU students and staff with data protection through this link. Gemini is accessible via personal Google accounts but not through TAMU's Google accounts. While each model has unique features and personalities, they share common issues: they can produce biased content and "hallucinate," a term for generating content that seems accurate but is actually incorrect or fabricated.

Hallucinations in generative AI happen because it extrapolates from patterns in its training data, sometimes leading to errors. For example, LLMs might create false citations or misattribute authorship. Ethan Mollick from the Wharton School at UPenn describes AI as "people pleasers" that confidently provide information—even if incorrect—to avoid disappointing users. This isn't intentional. Generative AI currently can't verify its own accuracy or logic. However, not all hallucinations are negative; they can inspire creativity, as seen when Copilot generated the playful phrase "It's raining cats and poodles" after a third re-prompt. While not factually accurate, such imaginative outputs can be useful for creative projects like writing children's books.

AI-STorybook-Cover-2-(2).jpg

This book cover was generated by Copilot. Clearly, the hallucinatory nature of generative AI makes its outputs simultaneously creative and prone to mistakes.

Bias is another significant issue in generative AI. The saying "garbage in, garbage out" is relevant here—since AI models are trained on human-created data, the biases present in that data are reflected in the outputs. This can lead to generative AI producing content that doesn’t fairly represent marginalized groups or that perpetuates stereotypes. For example, when I asked ChatGPT, Copilot, and Gemini to list humankind’s top five favorite animals, responses consistently included cats and dogs. Perhaps cats and dogs indeed are among humans’ favorite animals, but we can clearly see that the responses are skewed by the loudest, most prevalent types of sources the AI was trained on.

Tags:

Credits: Stephanie Liu, M.A., School Psychology Ph.D. Student