How to detect LLM hallucinations

Detecting hallucinations in large language models can be a challenging but important task, especially in applications where accuracy and reliability are crucial. Here are some methods and approaches to detect hallucination in LLM-generated text:

Fact Verification: Cross-reference the information generated by the LLM with external data sources, trusted references or databases to verify the accuracy of facts presented in the text. If the information contradicts established facts, it may be a sign of hallucination.
Contextual Understanding: Analyze the context of the generated text to determine if it aligns with your query or the conversation history. Hallucinatory responses may diverge significantly from the additional context or your previous inputs.
Adversarial Testing: Adversarial testing involves crafting input prompts designed to challenge the model to generate hallucinated text. By creating adversarial examples and comparing the output to human-curated responses, hallucination patterns can be identified, leading to improved detection mechanisms.
Consistency Analysis: Check for consistency within the generated text. Hallucinatory responses may contain contradictions or inconsistencies. You can use automated tools to identify logical inconsistencies within the text.
Chain of Thought Prompting: Chain of thought prompting involves asking the LLM to explain its logical reasoning step-by-step behind generated text. This allows tracing the reasoning chain to identify contradictory logic or factual gaps indicating hallucination risks.

Many organizations are also working towards identifying hallucinated content at the token level. This method assesses the likelihood of each token in the output being a hallucination and incorporates unsupervised learning components for training hallucination detectors.