Preflight & Automated Validation

DNXT Publisher Suite · Preflight Engine

Stop Discovering
Errors After
You Submit

47 automated validation rules. A self-correcting auto-fix engine. A 30-second preflight check that replaces two days of manual QC — and eliminates agency technical rejections before they happen.

DNXT Preflight — Submission Validation Report
IND-2024-0391 · Module 2–5 · 47 Rules 40 Passed
PDF/A-1b Compliance
Passed · 214 files
Font Embedding
Passed · All embedded
XML Schema (eCTD 3.2.2)
Passed · Well-formed
Broken Hyperlinks
3 found · Auto-fixable
!
Metadata — Author Field
4 files missing
Folder Structure (ICH M8)
Passed · Compliant

⚡ Auto-Fix Available — 7 issues resolvable automatically
Links · Metadata · Color profiles — no manual intervention required
Validation complete 40 / 47 rules passed · 23s
47
Automated Validation
Rules Executed
<30s
Auto-Fix Engine
Resolves Issues In
0
Agency Technical
Rejections Post-Preflight
100%
Pre-Submission
Dossier Coverage
Built For Real Workflows

Who This Is Built For

DNXT Preflight addresses the specific, daily frustrations of the people closest to regulatory submissions — not a generalized QC tool.

📋
Director, Regulatory Affairs
Mid-Size Pharma / NDA/BLA Sponsor

You're two days out from an NDA submission deadline and your team is manually checking 300-file dossier packages against FDA technical specifications — searching for non-embedded fonts, verifying every hyperlink resolves correctly, confirming eCTD folder naming conventions match the backbone. When an error surfaces after the gateway submission, you lose weeks to a technical rejection and a resubmission clock that now affects your PDUFA date. You've lived this. DNXT eliminates it entirely by running every check automatically before the package is assembled.

  • Eliminates the 2-day manual QC process before every submission
  • Zero technical rejections from FDA or EMA gateway validation
  • Clear, actionable preflight report your team can hand to agency without embarrassment
  • Confidence to submit on schedule, not when QC is "probably done"
⚙️
VP Regulatory Operations
CRO / Full-Service Regulatory Partner

You manage submissions across 12 concurrent programs for 8 sponsors — each with different agency targets, different eCTD versions, and different internal document standards. Your team's bottleneck isn't writing; it's QC. Senior regulatory scientists spend 30–40% of their time on mechanical checks that should never require a human. A single missed namespace declaration in an xml-stylesheet processing instruction caused a client's EU Type II variation to be rejected last quarter. That's the kind of error that costs you a client relationship. DNXT's 47-rule engine catches it automatically.

  • Scales QC across concurrent submissions without adding headcount
  • Frees senior staff from mechanical checks to focus on content strategy
  • Consistent validation quality regardless of which team member runs the check
  • Protects client relationships by guaranteeing technical compliance pre-delivery
🔬
Regulatory Technology Lead / CIO
Biotech / Emerging Pharma

Your regulatory team is lean — often one or two people managing submissions for a pipeline that suddenly has an IND, two INDs, and an EMA Scientific Advice package running simultaneously. You don't have the staff bandwidth for multi-day manual QC cycles. More critically, you don't have the infrastructure to catch errors that only surface at the agency gateway — and the 3-month delay that follows puts your Series B narrative at risk. You need preflight validation that runs automatically in CI, surfaces results before anyone even opens the submission package, and fixes what it can without human intervention.

  • API-driven validation integrates directly into existing build and release workflows
  • Async background queue — preflight runs while the team continues working
  • Auto-fix engine handles font, metadata, and link issues without manual steps
  • Investor-defensible submission timeline with no technical rejection risk
Under The Hood

How It Works

From submission package upload to a clean, actionable validation report — here is exactly what happens at each stage of the preflight engine.

1
Package Ingestion & File Inventory

When a submission package is uploaded or assembled within DNXT, the preflight engine immediately indexes every file in the dossier — PDF, XML, STF leaf nodes, and structural files — building a complete manifest with file paths, MIME types, and byte signatures. This inventory stage runs in under 3 seconds for packages up to 5GB and forms the canonical reference map that every subsequent validation rule operates against. No file is invisible to the engine: orphaned documents, duplicate filenames with case variations, and zero-byte files are all surfaced during this phase.

2
Parallel Rule Dispatch to Async Queue

The 47 validation rules are organized into four domain groups — PDF compliance, XML schema integrity, folder structure, and hyperlink resolution — and dispatched in parallel to DNXT's async processing queue. Each rule group runs as an independent worker, meaning PDF font analysis and XML schema validation happen simultaneously rather than sequentially. This parallel architecture is why the total validation time stays under 30 seconds even for packages containing hundreds of files. The queue is non-blocking: regulatory staff can continue working on other tasks, and a notification is pushed when results are ready.

3
PDF Compliance Analysis

Each PDF is parsed at the object level — not just visually rendered — to extract font dictionaries, color space definitions, embedded file streams, and document metadata XMP packets. The engine checks for PDF/A-1b conformance per ISO 19005-1, verifies that all fonts are fully embedded (not subset-embedded in ways that can cause rendering failures on agency viewer systems), flags non-sRGB color profiles, and validates that document-level metadata fields required by agency guidelines are present and populated. Failures include the exact file path, page number, and the specific ISO clause violated — not just "PDF issue found."

4
XML Schema Validation & Namespace Resolution

Every XML file in the submission — backbone, STF manifests, study tagging files — is parsed against the applicable DTD or XSD schema for the declared eCTD version (3.2.2 or 4.0), with namespace resolution verified against the authoritative schema repositories for FDA, EMA, PMDA, and Health Canada. The engine identifies malformed namespace declarations, incorrect element ordering that is schema-valid but agency-invalid, missing required attributes, and encoding issues such as byte order marks that corrupt XML parsers on agency systems. Schema errors are reported with the XPath location of the violating element, making them directly actionable for the regulatory scientist without requiring an XML specialist.

5
Folder Structure & Naming Convention Audit

The dossier folder hierarchy is validated against ICH M8 eCTD specifications and agency-specific technical conformance requirements, including FDA's Technical Rejection Criteria and EMA's eCTD validation criteria document. This includes verifying that folder names match the permitted character set (no spaces, special characters, or uppercase letters where prohibited), that module numbering corresponds to declared lifecycle operations, and that file naming conventions for sequences and leaf documents are consistent with the backbone XML declarations. Discrepancies between what the XML backbone references and what actually exists on disk are flagged as broken references — a common source of technical rejections that manual QC frequently misses.

6
Hyperlink Traversal & Resolution

Every internal hyperlink within every PDF — cross-document references from the Summary of Clinical Evidence to individual clinical study reports, for example — is extracted and resolved against the file inventory built in Step 1. The engine follows relative URI paths as they would be resolved by a PDF viewer operating from the submission root, identifying links that resolve to missing files, links that reference the correct filename but with incorrect case (a common failure on case-sensitive agency systems), and links where the target file exists but the named destination anchor within it is absent. External URLs are flagged but not followed. Each broken link is reported with its source document, source page, link text, and the resolved target path — so your team knows exactly which cross-reference to fix.

7
Auto-Fix Execution & Final Report Generation

Issues classified as auto-fixable — broken hyperlinks where the correct target can be inferred from the file manifest, missing metadata fields that can be derived from the submission record, font embedding failures where source font data is available in the DNXT document store — are resolved by the auto-fix engine in a single pass without any manual intervention. After fixes are applied, a second lightweight validation pass confirms resolution. The final output is a structured report in both human-readable HTML and machine-readable JSON: total rules executed, pass/fail/warning counts, detailed findings per domain, auto-fix actions applied, and a signed submission-readiness attestation that can be included in the package cover letter or retained as audit evidence.

Platform Capabilities

What the Preflight Engine Covers

Six interlocking validation domains that together achieve complete pre-submission coverage — so nothing reaches the agency gateway with an error that DNXT could have caught first.

📄

PDF Compliance Validation

DNXT validates every PDF against ISO 19005-1 (PDF/A-1b) conformance requirements — the standard explicitly required by FDA's Center for Drug Evaluation and Research for electronic submissions. Validation operates at the binary object level, not by rendering: font embedding completeness, color space declarations (sRGB enforcement, CMYK prohibition), document outline integrity, and XMP metadata packet validity are each evaluated independently. When a PDF fails, the report specifies the exact object ID and conformance clause violated — information your document management team can act on immediately rather than re-opening and manually inspecting the file.

🗂️

XML Schema Validation

Every XML file in your submission — eCTD backbone, STF manifests, study tagging files, and regional appendices — is validated against the authoritative DTD or XSD schema for the declared specification version. DNXT maintains current schema versions for FDA (eCTD 3.2.2, eCTD v4.0), EMA, PMDA, TGA, and Health Canada, and applies the correct schema automatically based on the declared namespace. Beyond schema conformance, the engine checks for encoding issues including UTF-8 BOM insertion, Windows-specific CRLF line endings in elements that require LF-only, and namespace prefix collisions that produce technically valid XML which nonetheless fails agency parsers — subtle errors that schema validators alone do not catch.

📁

Folder Structure & Naming Audit

eCTD submissions follow exacting folder structure requirements defined in ICH M8 and agency-specific Technical Conformance Guides — rules that are easy to violate during compilation when files come from multiple teams and systems. DNXT validates that the complete dossier hierarchy matches the expected module, section, and leaf structure for the target agency; that folder and file names conform to permitted character sets and length limits; and that every leaf file referenced in the backbone XML physically exists at the declared path with matching case. Cross-referencing the backbone against the actual filesystem catches the single most common source of FDA Technical Rejection Criteria failures — a reference in XML to a file that doesn't exist where it's declared to exist.

🔗

Hyperlink Validation & Traversal

In a well-structured eCTD submission, hyperlinks are navigation infrastructure — reviewers move from the integrated summary directly to the study reports and from study reports to individual data tables. A broken link forces a reviewer to hunt manually, which creates friction and reviewer dissatisfaction even when the data is present. DNXT extracts all internal PDF hyperlinks using the same URI resolution algorithm as Acrobat Reader, resolves each against the submission file manifest, and checks that not only the target file exists but that the named destination anchor within it is defined. Case-sensitivity mismatches — "StudyReport.pdf" linked from a document that expects "studyreport.pdf" — are detected and flagged separately, since these pass on Windows but fail on the Linux-based agency gateway systems.

🏷️

Metadata Completeness Checks

Document metadata — author, title, subject, creation date, and application fields embedded in PDF XMP packets — is increasingly scrutinized by agencies as part of document authenticity review and is required to be populated correctly under FDA's eCTD guidance. DNXT checks all required metadata fields across every PDF in the submission, flags documents where required fields are empty, and identifies cases where fields contain default values (e.g., "Microsoft Word - Document1") that indicate metadata was never properly set. For XML documents, the engine validates that document-level metadata elements in the backbone correspond to declared sequences and lifecycle operations. Metadata findings are auto-fixable where the correct values are derivable from the DNXT submission record, requiring no manual file editing.

Auto-Fix Engine

Detection without resolution is only half the problem solved. DNXT's auto-fix engine categorizes every validation finding by fixability: issues where the correct resolution is deterministic are resolved automatically without any manual step. Broken hyperlinks where the target file exists but the path casing is wrong are corrected by rewriting the link action in the PDF cross-reference table. Missing XMP metadata fields are populated from the DNXT submission record. PDFs failing color space requirements have their ICC profiles remapped. Font embedding gaps are resolved by re-embedding from the DNXT font library. Each auto-fix operation is logged with a before/after comparison, and a second validation pass confirms the fix resolved the finding. The result: regulatory staff review a clean package, not a snag list.