Eight layers deep. Every specimen traceable. Nothing approximated.
GUID-linked, cryptographically auditable, EBP-standard multi-omics — built for pharmaceutical AI, agritech pipeline integration, and LLM foundation model training.
Eight layers of data. One specimen. Unbroken provenance.
Every data product originates from a physical, GPS-logged, FPIC-consented specimen. Each is assigned a permanent GUID at collection — linking every downstream layer to the exact plant, location, and legal authorisation.
Specimen Provenance & GPS Metadata
GPS-logged (±3m), FPIC-consented, permit-governed physical herbarium voucher. GUID assigned at point of collection — the legal and scientific anchor for every layer above.
GPS ±3m · FPIC + PIC documentation · Permit reference · GUID at collection · Nagoya-compliant
Reference Genome (WGS)
Chromosome-level whole-genome assembly from long-read sequencing. Intraspecies variation panels from multiple individuals per species where population permits.
Merqury QV ≥40 · BUSCO ≥90% · Chromosome-level · Long-read · EBP standard
Transcriptome (RNA-Seq)
Tissue- and condition-specific expression profiling. Stress-response and drought-tolerance profiles with biosynthetic gene cluster identification.
RNA-Seq · Tissue-specific · Stress-response profiles · BGC annotation · DESeq2
Metabolome (LC-MS/MS Targeted)
Targeted alkaloid, terpenoid, and flavonoid panels. Full feature maps per specimen with compounds annotated against public spectral databases.
LC-MS/MS targeted · Alkaloids · Terpenoids · Flavonoids · Spectral DB annotation · Unknown-feature lists
Proteome (LC-MS/MS)
Protein expression profiling linked to transcriptomic data. AlphaFold-compatible structure predictions for uncharacterised enzymes endemic to Malagasy species.
LC-MS/MS proteomics · AlphaFold format · Binding domain analysis · Novel enzyme characterisation
Epigenome (ATAC-Seq / BS-Seq)
Chromatin accessibility and methylation profiling. Stress-condition epigenetic modifications — how species adapted to Madagascar’s extreme edaphic conditions.
ATAC-Seq · Whole-genome BS-Seq · Regulatory elements · Methylation profiles · Stress-condition data
Microbiome (16S / ITS)
Rhizosphere and endophytic profiling from matched soil and plant tissue. Ecological context unavailable from any laboratory culture collection.
16S rRNA bacterial · ITS fungal · Rhizosphere characterisation · Endophyte profiling
Ecological & Ethnobotanical Context
Validated ethnobotanical use records, IUCN status, and edaphic soil chemistry from collection site. Transforms raw omics into interpretable biological intelligence.
IUCN status · Ethnobotanical records · Soil chemistry · pH & mineral analysis · Distribution modelling
Reference-grade genomes. No shortcuts.
The IsoGentiX genome programme operates to Earth BioGenome Project (EBP) standards - the most rigorous benchmarks in the field. Every genome assembly is independently quality-assessed before entering the platform. This is not aspirational: it is a contractual requirement written into every data access agreement.
EBP quality thresholds exist because they are the minimum standard at which a genome assembly is reliably interpretable by AI/ML models and valid for drug target identification. Below these thresholds, gaps and errors in the assembly create false signals - precisely the noise pharmaceutical AI pipelines were built to eliminate.
When a data partner licenses an IsoGentiX genome, they receive a quality certificate alongside the data. Every assembly that does not pass is resequenced - not released.
(<1 error per 10,000 bp)
(conserved gene presence)
assembly target
sequencing platform
Intraspecies variation: Where population size and permit conditions allow, multiple individuals per species are sequenced to capture edaphic and population-level genetic variation - providing the statistical depth that single-specimen databases cannot offer.
EBP-standard. Merqury QV ≥ 40. BUSCO ≥ 90%. Contractual quality guarantee on every genome assembly delivered.
One permanent identifier. Every layer. Unbreakable chain.
Every specimen is assigned a Globally Unique Identifier (GUID) at the moment of physical collection. This GUID travels through every analytical layer - from the herbarium voucher to the final metabolite profile - creating an unbroken, machine-queryable chain of custody that is simultaneously a legal compliance record and a scientific data integrity mechanism.
Specimen IGX-2026-00847 - GUID linkage example
For Pharmaceutical Partners
Query by compound family or target class - then trace every hit back to its specimen, its genome, its collection location, and its benefit-sharing terms. No ambiguity. No data hygiene backlog.
For Agritech Partners
Access trait-linked genomic regions with substrate and microbiome context attached. Filter by edaphic condition: laterite, karst, ultramafic. Every result includes the ecological provenance needed to contextualise gene expression in crop-improvement models.
For AI / LLM Partners
Clean, GUID-structured multi-modal biological data in machine-readable formats. Every training sample carries legal provenance metadata - addressing the Nagoya compliance gap that currently prevents most AI platforms from using biological datasets at scale.
An immutable audit trail from field to API endpoint.
Every data transaction - collection, processing, quality assurance, licensed access - is recorded as an immutable event on a distributed ledger. This is not a marketing claim. It is the technical architecture required to demonstrate Nagoya Protocol compliance to regulators, to satisfy pharmaceutical due-diligence requirements, and to provide AI training data that can withstand legal scrutiny of its provenance.
Field Collection
GUID minted at point of collection. GPS, permit reference, and FPIC documentation hashed and recorded.
Laboratory Processing
Each analytical step (DNA extraction, sequencing run, LC-MS acquisition) logged with instrument ID, operator, and timestamp.
Quality Assurance
QV and BUSCO scores recorded. Assemblies below threshold flagged and held. Pass/fail decision and rationale recorded.
Benefit Sharing
Monetary and non-monetary benefit-sharing disbursements linked to licence events - recorded against the audit trail and reportable to MEDD.
Data Access
Every licensed query logged with partner ID and scope. Permissioned API returns provenance metadata with every data response.
The same species on different soils is a different dataset.
Madagascar's extraordinary geological diversity - laterite plateau, tsingy limestone, ultramafic substrates, quartzitic spiny desert - means that the same plant species can exhibit dramatically different gene expression profiles, metabolite production, and stress-response mechanisms depending on where it grows.
Most public genomic databases treat a species as a single entity. IsoGentiX captures intraspecies variation at the edaphic level: where population size and permit conditions allow, we sequence multiple individuals per species from different substrate types. This produces variation panels that are directly relevant to crop resilience modelling and pharmaceutical lead diversification.
The microbiome layer (Layer 7) amplifies this: rhizosphere community composition from ultramafic substrates is profoundly different from that on laterite. These microbiome-substrate-plant interactions are entirely absent from any existing public or commercial dataset.
Unlocking nature's intelligence.
Founding Partners receive first-mover domain exclusivity, direct input into the species prioritisation schedule, and data access that begins before public database availability.