Retail Banking Institute

Certified International Retail Banker Certificate

Artificial Intelligence

Hallucinations

Hallucinations occur when an AI system such as an LLM generates incorrect or value-less answers, or information. The risk is that it normally presents such no-value output in a very well-constructed and structured answers, sometimes in an authoritative fashion. As we’ve noted, it generally won’t say ‘I don’t know’.

Such hallucinations are non-intentional and are a consequence of how the models are built, and how AI works. AI hallucinations occur when an artificial intelligence system, such as a language model, generates incorrect or nonsensical information that it presents as valid or factual. These hallucinations are not intentional lies but rather byproducts of how AI models process and predict information.

How do hallucinations happen, and what causes it? Some models are built with too much emphasis on looking and feeling like humans. Remember the Turing Test? A model without a proper balance between looking and feeling human and good and rich data is a cause of hallucinations. If there is too much emphasis on looking human, along with poor or limited data with very little learning/training effort, and made worse by bad supervision, the results will be poor. We also need to remember that we are dealing with Narrow AI, so we must respect the subjects that the model was trained in.

LLM models are built to generate outputs based on the patterns that present the highest incidence, such as the statistical probability of a certain word appearing after another, or being right, with no actual understanding of the output or veracity checking capacity. Again, LLMs can provide very authoritative answers that are in fact wrong.

The model itself does not understand if the data quality is bad. Training limitations, such as the lack of quality and quantity of data available for training, will create inaccuracies. The model will simply extrapolate an answer from that bad quality source material.

As we have mentioned, AI doesn’t have any idea of the world and has no cognitive capacity. All it does is search large amounts of data in a very fast and powerful way, looking for the most common patterns to respond to a prompt. Bad and vague prompts also create hallucinations, as the model will try to fill the gaps in the question or request made in order to provide an answer. Because the models are built to always provide a solution or output, even if they lack sufficient knowledge or confidence, the result can be overgeneralisation or even invention.

We must always remember that we live in a Narrow AI world, where a clearly defined object, clearly defined subject of learning, and vast and good quality data are essential for valuable outputs. For example, when prompting Microsoft Copilot with: “language models are trained to produce text that is plausible”, its reply is:

“Indeed, language models like me are designed to generate coherent and contextually relevant text. Our training data consists of vast amounts of text from various sources, allowing us to learn patterns, grammar, and semantics. While we strive for plausibility, it’s essential to remember that our responses are based on statistical patterns rather than true understanding or consciousness.”

We asked AI for some examples of hallucinations to contextualise:

Fabricated Facts
User: “Who invented the first airplane?”
AI Response: “The first airplane was invented by Nikola Tesla in 1899.”
Reality: The Wright brothers invented the first airplane in 1903, not Tesla

Fake Citations
Example: An AI generates a citation for a nonexistent research paper.
User: “Provide a research paper on machine learning.”
AI Response: “According to Smith, J. (2020), ‘Deep Learning Techniques for Big Data’, published in the Journal of Advanced AI, deep learning models are highly scalable.”
Reality: This paper and journal do not exist

Incorrect Calculations
User: “What is 17 times 23?”
AI Response: “17 times 23 is 401.”
Reality: The correct answer is 391.

Invented Historical Events
User: “Tell me about the Treaty of Atlantis in 1900.”
AI Response: “The Treaty of Atlantis was signed in 1900 to end conflicts between several island nations in the Atlantic.”
Reality: There is no historical record of a “Treaty of Atlantis”.

Hallucinated Abilities
User: “Can you solve this quantum physics equation?”
AI Response: Provides a detailed looking but entirely incorrect solution, using made-up physics terminology.
Reality: The response has no scientific basis.

Misinterpretation of Context
User: “What are some good recipes for apple pie?”
AI Response: “To make apple pie, you'll need bananas, sugar, and flour.”
Reality: Bananas are not typically used in apple pie recipes.

Hallucinations in Conversational Contexts
User: “What movies has Tom Cruise won Oscars for?”
AI Response: “Tom Cruise won Oscars for Mission Impossible and Top Gun.”
Reality: Tom Cruise has never won an Oscar, although he has been nominated.

Misinformation in Critical Applications
In healthcare, an AI providing diagnostic suggestions might state:
AI Response: “A high fever and rash are definitive signs of bacterial pneumonia.”
Reality: While a fever and rash may indicate other conditions, they are not definitive signs of bacterial pneumonia.

Key Takeaways

AI hallucinations can range from minor factual errors to critical misinformation in sensitive domains like healthcare, law, or education. The example above about the ‘Treaty of Atlantis’ shows how an authoritative voice combined with making up details delivers what seems like a plausible answer.

These hallucinations occur because AI models predict text based on patterns in their training data, without a true understanding of the information or ability to verify facts. To minimize hallucinations, we need to understand how AI works, its capabilities and weaknesses and then try to maximise one and minimize the other. You must clearly define the object and subject that the model you are building will deal with. Focus on providing as much data on the subject and be careful with the quality and veracity of data provided to the model.

It's important to monitor closely the model output, constantly sampling the answers, outputs, and adjusting the prompts and enriching the quality and quantity of data. On top of the basic and important aspects of minimizing the hallucinations presented above, there are other actions to reinforce such effort. Let’s look next at those.

1 Fact check. Sample outputs and reconcile them with reliable alternative sources.

2 The quality of the prompts must be clear, concise, providing context, avoiding ambiguity, and any way to reduce misinterpretation.

3 Use human supervision. It’s important AI is a tool to assist and not replace human judgment, especially in critical areas.

4 Always improve training. Where possible, incorporate more and better data into the model. Build self-regulated systems that indicate the confidence levels or identify when an output is speculative.

Hallucination can be dangerous. Below are just a few examples.

Meta Platforms:
Incident: In July 2024, Meta's AI assistant incorrectly stated that an attempted assassination of former President Donald Trump did not occur.
Cause: This error was attributed to the AI's tendency to produce false information, known as hallucinations.
Impact: The incident highlighted challenges in managing AI accuracy, especially concerning real-time events.

GSK (GlaxoSmithKline):
Issue: The pharmaceutical giant faced persistent problems with AI hallucinations in applications like scientific literature review, genomic analysis, and drug discovery.
Approach: GSK employed strategies such as test-time compute scaling to improve the accuracy and reliability of its generative AI systems.

Amazon:
Challenge: Before launching an AI-enabled version of Alexa, Amazon needed to address the hallucination problem inherent in AI models.
Consideration: Ensuring the AI provides accurate and reliable information is crucial for maintaining user trust and safety.

ElevenLabs:
Controversy: The company's AI voice-cloning software was misused to generate controversial statements mimicking celebrities and public figures, raising ethical concerns.
Response: ElevenLabs implemented safeguards and identity verification to mitigate potential abuse of its technology.

CNET:
Situation: In January 2023, it was revealed that CNET had been using an undisclosed internal AI tool to write at least 77 stories.
Outcome: After the news broke, CNET posted corrections to 41 of these stories, indicating issues with the AI-generated content's accuracy.

In the banking industry, cases of institutions facing hallucination are very rarely publicly mentioned or documented, as banking is about trust and recognising issues on the quality of output, and the accuracy of answers provided by a particular bank can undermine a fundamental value, which is confidence. However, we know of cases that have affected customer service, trade operations, and investment decisions, driven by improperly supervised AI solutions. This has happened to an extent that the main banks are continuously implementing measures to mitigate these risks and ensure the reliability of AI systems.

A very important consideration is human supervision of the accuracy and appropriateness of the behaviour of AI solutions in customer treatment. Human supervision is also important in sampling operations and deals and being active in moments of market turbulence to avoid “bot overreactions”.

In the banking industry, cases of institutions facing hallucination are not publicly mentioned or documented, as banking is about trust. Banks must identify and recognise issues about quality of output, as bad outputs can undermine the fundamental value of confidence in the bank. While specific instances of banks directly experiencing AI hallucinations are not widely documented, the financial sector is acutely aware of the potential risks associated with AI-generated inaccuracies. Banks are proactively implementing measures to mitigate these risks and ensure the reliability of AI systems.

Hallucinations

Key Takeaways

Candidate Dashboard