Skip to content
Case studiesPricingSecurityCompareBlog

Europe

Americas

Oceania

Guide10 min read

How AI Generates Fake Documents โ€” and How to Detect Them

Generative AI models (GANs, diffusion, LLMs) fabricate payslips, IDs and bank statements that fool the human eye. Learn how they work and what detection methods actually stop them in 2026.

CheckFile Team
CheckFile Teamยท
Illustration for How AI Generates Fake Documents โ€” and How to Detect Them โ€” Guide

Summarize this article with

Generative AI has fundamentally changed document fraud. Models that once required specialist training now fabricate convincing payslips, passports and bank statements in minutes, for the price of a coffee. Understanding how this works โ€” and what detection techniques actually catch it โ€” is now a core competency for anyone running KYC, credit underwriting or tenancy checks in 2026.

This article is for informational purposes only. Regulatory requirements evolve โ€” consult the FCA, your legal counsel or a specialist compliance adviser for guidance specific to your organisation.

How generative AI fabricates fake documents

Generative AI does not produce forgeries by copying existing images. It learns the statistical structure of real documents and generates new instances โ€” with all the characteristic graphical details โ€” without reproducing any identifiable original.

Generative adversarial networks (GANs)

GANs pit two neural networks against each other: a generator that produces document images and a discriminator that tries to tell them apart from genuine documents. Training on thousands of authentic and forged pairs iteratively sharpens the generator until the discriminator can no longer distinguish. Applied to document fraud, GANs produce visually coherent identity cards, passports and driving licences โ€” typefaces, security background patterns, MRZ zones included. The key weakness: GANs leave spatial frequency artefacts (oscillations characteristic of high-gradient zones) detectable by calibrated sensors.

Diffusion models and consumer-grade tools

Diffusion models (Stable Diffusion, DALLยทE, Midjourney) now dominate high-resolution image generation. Their ability to follow precise text prompts ("UK passport, male, born 1990, 35ร—45 mm photo, OCR-B typeface") makes them well-suited to on-demand forgery production. Since 2023, specialised tools โ€” distributed on illicit forums โ€” combine these models with pre-populated PDF templates for the main European and UK official documents. The result: a deliverable forgery in under an hour, for under ยฃ80, with no graphic design experience required.

LLMs combined with document templates

Large language models (LLMs) such as GPT-4 and their open-source equivalents generate the textual content of fraudulent documents: names plausible for the claimed nationality, valid-looking addresses, tax reference numbers that pass Luhn-algorithm checks, and salary figures consistent with a fictitious collective agreement. Combined with layout software (LibreOffice, Adobe Acrobat), they enable batch production of fake payslips or bank statements.

According to the ACFE 2024 Report to the Nations, only 37% of document fraud is detected through direct human review. Manual review is structurally insufficient against these techniques.

Document types most frequently forged using AI in 2026

Document type Dominant AI technique Characteristic detection signal
Passport / national ID GAN or diffusion model Frequency artefacts, non-standard typefaces
Payslip LLM + PDF template Contribution/net pay mismatch, software metadata
Bank statement LLM + template Invalid IBAN, inconsistent value dates, missing BIC
Diploma / certificate Diffusion + editing Suspicious vector seal, unofficial typography
Supplier invoice LLM + template Invalid VAT number, suspiciously round amounts
Proof of address LLM + operator logo Unofficial header, address inconsistent with postal database

How detection methods work

No single detection technique is sufficient. Reliability comes from their multi-layer combination.

Forensic analysis: ELA and metadata

Error Level Analysis (ELA) reveals JPEG compression inconsistencies: modified zones exhibit a different compression level from the rest of the image. An AI-generated document typically shows suspicious uniformity โ€” either excessively regular (synthetic image) or with disparate compression islands (copy-paste operations).

Metadata analysis (EXIF for images, XMP/producer for PDFs) exposes the software chain used: an identity document whose metadata lists "Adobe Photoshop 2024" as the creation tool, or a PDF whose creation date predates the declared issue date, are immediately actionable signals.

In 2024, ENISA (the EU Agency for Cybersecurity) identified AI-assisted document fraud as one of the leading emerging threats in its annual threat landscape report.

Machine learning-based deepfake detection

Convolutional neural networks (CNNs) trained on corpora of authentic and forged documents detect spatial frequency artefacts invisible to the human eye: oscillations characteristic of GANs at letter edges, pixel-grid irregularities typical of diffusion models, the absence of natural sensor noise in synthetically generated facial photographs.

CheckFile deploys an additional layer of AI-generation signals as a complement to existing structural controls โ€” an approach that augments traditional document coherence checks and is calibrated according to the client's sector risk level.

Physical security feature verification

Official documents contain security features whose digital simulation remains imperfect: diffractive holograms, offset rosette printing, kinetic-effect inks, microprinting. When captured via webcam or scanner, these features produce distinctive optical signatures that forgeries replicate graphically, but without the physical dimension โ€” detectable immediately by UV/infrared certified scanners.

Cross-document data validation

The most effective detection combines document analysis with verification of the data the document contains: a document number that does not exist in the national registry, a date of birth inconsistent with the tax reference format, an employer whose company number does not match the declared collective agreement, an IBAN whose country code differs from the stated bank. These cross-checks are impossible to perform manually at scale โ€” they form the core of a modern automated document verification solution.

For a detailed comparison of detection methods applied to synthetic identity documents, see our analysis on deepfake document detection.

Ready to automate your checks?

Free pilot with your own documents. Results in 48h.

Request a free pilot

The UK and EU regulatory framework

UK Money Laundering Regulations 2017

Regulated firms in the UK โ€” banks, insurers, estate agents, accountants, solicitors โ€” are required under the Money Laundering, Terrorist Financing and Transfer of Funds (Information on the Payer) Regulations 2017 (MLR 2017) to verify the identity of clients using reliable, independent source documents. The JMLSG Guidance (Part I, Chapter 5) explicitly requires firms to assess the risk that documents presented may be fraudulent, including digitally fabricated ones.

FCA expectations on digital onboarding

The FCA's PS21/19 policy statement on strong customer authentication and related guidance clarifies that firms must ensure their remote identity verification processes are robust against the risk of synthetic or manipulated documents. The obligation is on the firm to demonstrate that controls are proportionate to the risk โ€” not simply that a document was accepted in good faith.

UK Fraud Act 2006 and criminal liability

Producing or using a fraudulent document is an offence under Section 2 of the Fraud Act 2006, carrying up to 10 years' imprisonment. Using AI as the generation mechanism does not constitute a defence โ€” intent to deceive is the relevant element.

The EU AI Act (reference for UK-facing firms)

For organisations operating in both UK and EU markets, the EU AI Act (Regulation 2024/1689), in force since 1 August 2024, classifies remote biometric identity verification systems as high-risk (Annex III, point 1). Article 50 imposes a disclosure obligation on AI-generated synthetic content. Firms deploying these systems within the EU must comply with robustness and risk management requirements (Articles 9โ€“15) as of 2 August 2026.

According to PwC's Global Economic Crime Survey 2024, 47% of organisations surveyed globally experienced fraud in the past two years, with document fraud representing the fastest-growing attack vector in financial services.

Questions practitioners ask in reality

Compliance forums regularly surface the following questions from KYC and onboarding teams:

"Can an AI-generated payslip fool automated OCR systems?" Yes, in many cases. LLMs generate consistent salary figures, employer names and deduction calculations that pass basic OCR extraction. The distinguishing factors lie in metadata, statistical salary distributions, and cross-referencing employer registration data โ€” not in visual inspection.

"What are the most reliable visual signs of an AI-generated document?" A counterintuitive answer: excessive image quality. AI-generated facial photographs often display perfect clarity, unnatural facial symmetry, and an absence of camera sensor noise. However, current models have largely corrected these artefacts โ€” automated forensic controls are the only reliable defence at scale.

To build your team's ability to identify visual warning signs, see our guide on training teams to spot AI-generated documents.

Implementing effective detection in three steps

Step 1 โ€“ Map your exposure. An online loan application submitted without in-person verification carries structurally higher fraud risk than a document delivered face-to-face. Risk analysis must quantify volume, collection channel and the criticality of the downstream decision.

Step 2 โ€“ Deploy a multi-layer solution. Effective detection combines forensic analysis (ELA, metadata), specialised ML models, security feature verification and cross-data validation. Single-technology solutions โ€” OCR alone, or metadata analysis alone โ€” miss second-generation fakes.

Step 3 โ€“ Document procedures for audit. The FCA expects firms to demonstrate a clear, documented process for identity verification decisions: who decides, on what basis, using which tool, within what timeframe. This documentation is required for supervisory reviews.

Discover how CheckFile integrates these detection layers into a workflow compatible with your existing KYC systems โ€” as a complement to your existing controls. See our pricing for an estimate based on your document volume.

For a complete overview of document verification best practices, consult the document verification guide.

Frequently Asked Questions

What exactly is an AI-generated document?

An AI-generated document is a file whose content โ€” in whole or in part โ€” has been produced by a generative AI model (GAN, diffusion model or LLM). It may be an entirely fictitious document or a genuine document with specific fields replaced or modified. Current models produce outputs that are indistinguishable from genuine documents by visual inspection alone.

Is producing fake documents with AI illegal in the UK?

Yes. Producing or using a fraudulent document constitutes an offence under Section 2 of the Fraud Act 2006, carrying up to 10 years' imprisonment. The fact that AI was used to generate the document does not constitute a defence โ€” criminal intent is the relevant element. Forgery of specified instruments (passports, driving licences) carries additional penalties under the Forgery and Counterfeiting Act 1981.

Are free detection tools sufficient for professional use?

Free online tools generally perform basic forensic analysis (ELA, metadata) that catches crude forgeries. They are insufficient for fakes produced by current-generation diffusion models, which generate minimal detectable artefacts. For professional contexts with regulatory obligations, a specialised solution that is regularly updated against evolving forgery techniques is required.

Which document types carry the highest fraud risk in the UK?

Payslips for credit applications, bank statements for tenancy checks and driving licences for age verification represent the highest-volume targets in the UK market. These documents combine high decision-making stakes with predominantly remote submission channels, creating optimal conditions for AI-assisted forgery.

How does cross-document validation improve detection rates?

Cross-document validation checks the internal consistency of data across multiple documents in a dossier โ€” for example, verifying that an employer's company registration number matches the declared collective agreement, or that an IBAN's country code corresponds to the stated bank. This layer catches forgeries that pass visual and forensic analysis by exploiting logical inconsistencies rather than image artefacts. See CheckFile's verification platform for implementation details.


This article is provided for informational purposes only. Regulatory requirements vary by sector and jurisdiction. Consult the FCA, JMLSG guidance or a qualified compliance professional for advice specific to your organisation. For a broader overview, see the document verification guide.

For where this fits in the CheckFile offering, see our AI and deepfake detection approach.

Stay informed

Get our compliance insights and practical guides delivered to your inbox.

Ready to automate your checks?

Free pilot with your own documents. Results in 48h.