Detect PDF Metadata Tampering: How to Spot an Altered Document
How to detect a tampered PDF document through metadata analysis: forensic techniques, tools, red flags, and UK compliance requirements for document verification teams.

Summarize this article with
Detecting a tampered PDF starts with its metadata: creation timestamp, producing software, modification history, and XMP fields can expose alteration within seconds โ no visual inspection required. PDF metadata forensics, once the preserve of court-appointed examiners, is now a frontline tool for KYC, lending, and compliance teams facing rising document fraud.
According to the ACFE 2024 Report to the Nations, 37% of occupational frauds are detected through internal controls, with automated file property analysis accounting for a growing share of those detections. Most falsified documents carry detectable metadata inconsistencies โ the challenge is knowing where to look.
What PDF Metadata Tampering Is
PDF metadata tampering means deliberately altering the information embedded in a PDF file's header to conceal that the document's content has been changed. These fields โ invisible when a PDF is opened normally โ record the file's complete production history.
A PDF contains two distinct metadata layers:
The Information Dictionary (the /Info entry in the PDF catalogue) stores human-readable fields: author, title, keywords, creating application (/Creator), PDF-writing software (/Producer), creation date (/CreationDate), and last modification date (/ModDate).
XMP metadata (Extensible Metadata Platform, ISO 16684 standard) is an XML block embedded in the file stream. It mirrors the /Info dictionary but with finer granularity, including revision history (xmpMM:History), the document's unique identifier (xmpMM:DocumentID), and the instance identifier (xmpMM:InstanceID) โ regenerated on every save.
A falsified document frequently shows inconsistencies between these two layers, because basic editing tools only update one of them.
How Fraudsters Alter PDF Metadata
The most common tampering techniques use freely available software, which explains why metadata anomalies appear across such a wide range of forged documents.
Direct /Info dictionary editing: hex editors or Python libraries like PyPDF2 can overwrite date and author fields without leaving any visible trace in the rendered document. A payslip whose creation date has been pushed back six months can be made to look current.
Digital reprinting: the forged PDF is printed to a new PDF file via a virtual printer, wiping the original metadata and generating fresh dates consistent with the falsification date. This technique produces recompression artefacts detectable by ELA analysis.
Adobe Acrobat or LibreOffice edits: opening and resaving a document in these applications automatically updates /ModDate and records the software as /Producer. This leaves an involuntary trace โ the modification date becomes later than the declared creation date on the document.
XMP field manipulation: more sophisticated actors also modify XMP metadata to align both layers. But the xmpMM:InstanceID โ a UUID generated on every save โ changes with each modification, and its format can betray which software was used.
Techniques for Detecting Altered Metadata
Forensic analysis of a suspect PDF combines several complementary checks. A multi-layer analytical approach combining metadata, file structure, and cross-document consistency represents the most reliable methodology for identifying tampered PDF documents.
Extracting and Verifying Raw Metadata
The first step is extracting all metadata from the file. ExifTool (exiftool.org) is the reference tool: it reads /Info, XMP, and EXIF metadata simultaneously and flags cross-layer inconsistencies.
Key red flags to look for:
| Field | Red Flag | Likely Interpretation |
|---|---|---|
/ModDate later than /CreationDate |
Document re-saved after creation | Content likely modified |
/Producer differs from /Creator |
Document converted or printed to PDF | Content potentially rewritten |
XMP InstanceID โ DocumentID |
At least one post-creation save | Revision after original production |
Empty /Info fields, XMP populated |
Selective metadata scrubbing | Concealment attempt |
| Creation date before 1993 | Impossible value (PDF invented in 1993) | Falsified metadata |
| Timezone offset inconsistent with issuer location | +00:00 for a document issued by a UK bank | Produced outside declared jurisdiction |
Structural Analysis of the PDF File
Beyond metadata, the PDF's internal structure reveals its history. The PDF format is incremental: each modification appends a new revision to the file without erasing previous ones. pdfid.py (blog.didierstevens.com) and QPDF (qpdf.sourceforge.io) read the number of revisions (xref tables) and identify which objects were modified.
A legitimate payslip or bank statement carries exactly one revision: the original generation by payroll software or banking system. Multiple revisions โ especially affecting text or image objects โ are a strong indicator of tampering.
Cryptographic Hash Verification
Some official documents include an electronic signature or timestamp compliant with eIDAS Regulation (EU) No 910/2014 โ which remains enforceable in the UK under the Electronic Identification and Trust Services for Electronic Transactions Regulations 2016. Signature verification immediately reveals any post-creation content change: any alteration to the data stream invalidates the cryptographic signature.
For documents without signatures, the SHA-256 hash can be compared against a reference copy โ where available via the issuing portal. HMRC, for instance, offers a Check a Document service for tax-related official correspondence, allowing document authenticity confirmation independent of metadata analysis.
ELA (Error Level Analysis)
ELA detects areas of a digital document that have undergone different recompression than the surrounding content. Applied to PDFs containing images โ identity photos, scanned pages โ it reveals retouched zones with a precision the naked eye cannot match.
In practice: a salary figure replaced in a scanned payslip shows a slightly different compression error level from surrounding areas, even after multiple recompressions. Tools like FotoForensics and their equivalents built into document verification platforms automate this analysis.
Ready to automate your checks?
Free pilot with your own documents. Results in 48h.
Request a free pilotForensic Tool Comparison
| Tool | Primary Use | Free | API-Integrable |
|---|---|---|---|
| ExifTool | Metadata extraction | Yes | Yes (command line) |
| pdfid.py | PDF structure analysis | Yes | Yes (Python) |
| QPDF | Incremental revision reading | Yes | Yes |
| pdf-parser.py (Didier Stevens) | Raw PDF objects | Yes | Yes |
| Autopsy + PDF Parser | Judicial forensics | Yes | No (GUI) |
| CheckFile Platform | Automated multi-layer analysis | No | Yes (REST API) |
Regulatory Requirements in the UK
UK firms subject to the Money Laundering, Terrorist Financing and Transfer of Funds (Information on the Payer) Regulations 2017 (MLR 2017) must apply customer due diligence measures appropriate to the risk. The FCA's Financial Crime Guide makes clear that document verification must go beyond visual inspection where digital documents are submitted remotely.
The FCA issued Dear CEO letters in 2024 and 2025 reminding regulated firms that reliance on self-certified digital documents without forensic controls constitutes inadequate CDD. Firms that accept PDFs without metadata verification face regulatory risk, particularly in mortgage lending, account opening, and KYB processes.
For employment and right-to-work checks, the Home Office guidance on digital identity specifies that certified identity service providers must include document authenticity controls โ which in practice means metadata and structural analysis for PDF submissions.
The Information Commissioner's Office (ICO) also notes that personal data verification processes must be proportionate and documented, reinforcing the need for a traceable forensic audit trail when processing identity documents.
Cross-Document Consistency Checks
Verifying a single PDF's metadata in isolation is insufficient. The most reliable detection approach cross-checks several documents issued by the same source: an employer consistently produces payslips with the same payroll software, the same exact company name, the same fonts. Variations between consecutive payslips โ changed /Producer, different font, different software version โ are strong tamper indicators.
This cross-document validation approach is now central to automated document verification systems. It moves beyond traditional OCR by incorporating structural and metadata consistency across the files in a single dossier.
Specialised platforms like CheckFile implement this multi-layer logic โ structural, metadata, and cross-document โ to automatically flag dossiers showing anomalies. As a complement to your existing controls, AI-generation signal analysis on suspect documents adds a further detection layer.
For a broader overview of forensic document analysis techniques, the article on document forensics tools and AI comparison covers the full landscape of solutions available in 2026.
Real Patterns in Document Fraud Cases
Compliance professionals on specialist forums (r/compliance, r/fintech, r/banking) consistently report the same recurring patterns:
Reconstructed payslips: the most common case involves a payslip exported from payroll software (Sage, Xero, QuickBooks), converted to PDF, then edited to adjust salary figures. The /Producer chain reveals the sequence: Sage Payroll โ Adobe Acrobat โ Microsoft Print to PDF.
Edited bank statements: a bank generates the statement with proprietary software. The fraudster opens it in a PDF editor, changes transaction amounts, and resaves. The /ModDate becomes later than the declared statement date, and the XMP UUID changes.
Modified tax documents: more sophisticated, because some actors regenerate the entire document. However, HMRC digital documents carry a unique document reference verifiable through official channels, making full substitution immediately detectable.
Integrating Metadata Analysis into Your Verification Workflow
For compliance teams processing large document volumes, manual metadata analysis is impractical. Several approaches allow this control to scale:
Document analysis APIs: platforms like CheckFile expose REST APIs that accept a PDF and return a document risk score incorporating metadata analysis, file structure checks, and cross-document consistency scoring.
Batch analysis scripts: for technical teams, ExifTool combined with a Python script can process hundreds of PDFs per hour and automatically flag anomalies. The core rule: any document whose /ModDate is later than its declared production date warrants human review.
Document reception checklist: for teams without technical tooling, a simple procedure is to systematically request the original digital file (not a print-scan copy) and check its creation date via file properties โ accessible in any PDF reader under File โ Properties โ Description.
For broader document fraud detection capabilities, the AI-powered document detection page explains how CheckFile integrates these signals into its verification pipeline.
Also see the complete document verification guide for a full overview of methods by document type.
Frequently Asked Questions
Can you detect a tampered PDF without specialist software?
Yes, partially. Any PDF reader (Adobe Reader, macOS Preview) displays basic metadata under File โ Properties. A modification date later than the declared issue date is an immediate red flag. For complete analysis โ incremental revisions, XMP consistency โ free tools like ExifTool are needed.
Can metadata be altered in a completely undetectable way?
It is difficult, but possible for a skilled actor. An advanced fraudster can align both metadata layers (Info and XMP), strip revision history, and regenerate the document UUID. However, residual artefacts โ compression level, font version, PDF object structure โ typically remain detectable through thorough forensic analysis.
What is the legal standing of PDF metadata analysis in UK courts?
Metadata forensic analysis can constitute admissible evidence in UK civil and criminal proceedings, particularly in forgery cases under the Fraud Act 2006 and the Forgery and Counterfeiting Act 1981. Chain of custody (analysis traceability) must be documented. For maximum legal weight, analysis should be performed by a qualified digital forensics expert.
Do UK banks verify metadata when documents are submitted online?
UK banks subject to MLR 2017 have a legal obligation to apply risk-proportionate due diligence. In practice, more advanced institutions use automated document analysis platforms that include metadata verification. FCA guidance following the 2024-2025 Dear CEO letters has accelerated adoption of technical document controls across the sector.
How do I verify the authenticity of a document issued by a government body?
Many UK government documents include verifiable codes. HMRC tax documents carry a unique reference; Companies House filings are verifiable online; DBS certificates can be checked via the DBS online checking service. Where such verification services exist, they should be used alongside โ not instead of โ metadata analysis.
For where this fits in the CheckFile offering, see our AI and deepfake detection approach.
Stay informed
Get our compliance insights and practical guides delivered to your inbox.