The single most important thing to understand about multi-omics

Individual omics data types — a genome sequence, a metabolite profile, a transcriptome — are useful in isolation. But the questions that matter most in drug discovery and crop science cannot be answered by a single data type. They require knowing how a biological system works across multiple levels simultaneously.

Why does this plant produce compound X in its leaves but not its roots? Which gene family is responsible for its unusual alkaloid chemistry? How does its biosynthetic pathway respond when the plant is under water stress? Those questions require, at minimum, a genome (what genes exist), a transcriptome (which genes are switched on and in what tissues), and a metabolome (what compounds are actually being produced).

Multi-omics is the practice of generating and integrating all of those data types from the same specimen — so that every inference you draw is anchored in the same biological individual.

The eight layers and what each one contributes

Data layer What it measures What it enables
Reference Genome Assembly The complete DNA sequence of the organism, assembled to chromosome level (EBP standard: Merqury QV ≥40, BUSCO ≥90%) Gene discovery; biosynthetic pathway mapping; evolutionary comparison; CRISPR target identification
Transcriptomics Tissue-specific RNA-seq — which genes are actively expressed in leaf, root, flower, and under what conditions Identifies which genes are producing the chemistry you observe; maps biosynthetic gene clusters to metabolite output
NIR Metabolite Fingerprints Near-infrared spectroscopy collected in the field for every specimen — a rapid whole-metabolome fingerprint Chemical triage at scale: identifies chemically interesting specimens before lab analysis; enables population-level metabolomics
Targeted LC-MS/MS Metabolomics High-resolution mass spectrometry for alkaloid, terpenoid, and flavonoid compound identification and quantification Structural identification of bioactive compounds; concentration profiles; novel scaffold discovery
Phenotypic Traits Morphological and phenological data in Darwin Core schema — leaf morphology, growth habit, reproductive timing Links genotype to observable phenotype; enables genomic selection for trait mapping in agritech
Soil & Substrate Chemistry XRF elemental analysis of collection site soil (30+ elements); substrate type, pH, moisture Contextualises metabolite and stress-response profiles; enables edaphic adaptation gene discovery
Cryopreserved Germplasm Living biological material stored at −80°C and −196°C liquid nitrogen Enables future wet-lab validation; gene expression studies; synthetic biology; provides physical chain-of-custody for provenance
AI-Ready Metadata & Provenance FAIR-compliant metadata; blockchain-verified collection provenance; SHA-256 data integrity; steganographic watermarks Legal defensibility of commercial use; chain-of-custody for Nagoya compliance; data integrity certification

Why genome sequence alone is not enough

A genome assembly tells you what genes a plant has. It does not tell you which genes are producing the compounds you are interested in. It does not tell you whether those genes are expressed in the tissue type you would harvest commercially. It does not tell you whether expression is constitutive or stress-induced. And it does not confirm that the compound your biosynthetic pathway predicts is actually present in the living organism at detectable concentrations.

This gap between genomic prediction and metabolic reality is a well-documented problem in natural product drug discovery. Many biosynthetic gene clusters that look promising in genome analysis produce trace quantities of target metabolites, or produce them only under specific environmental conditions, or produce related but structurally different compounds from the ones predicted.

Transcriptomics bridges the gap between genome and metabolome by telling you which genes are actually switched on in which tissues. Targeted metabolomics confirms what compounds are present and at what concentrations. Soil chemistry explains why the same species growing in different locations may produce completely different chemical profiles.

"The genome is the blueprint. The transcriptome is the assembly line operating at this moment. The metabolome is what actually came off the line."

Specimen-level integration: why it matters

Multi-omics data that is not integrated at specimen level — where genome data is from one individual, transcriptomics from another, and metabolomics from a third — produces correlations that may be artefactual. Natural populations of plants show significant individual variation. If you are drawing inferences about the relationship between a gene and a compound, those inferences are only as valid as the individual-level linkage in your data.

Specimen-level integration means that every data layer for a given individual is tagged with the same Globally Unique Identifier (GUID). When you query the relationship between a transcriptomic profile and a metabolite concentration, you are asking about the same individual organism, not an average across a population. This is the structural requirement for generating causal rather than merely correlational hypotheses — which is precisely what AI-driven drug discovery platforms need.

The AI model training implication

Large language models and graph neural networks trained on biological data learn relationships between features. If your genome data and metabolome data come from different individuals, the model learns population-level averages — which reduces its capacity to learn the precise mechanistic relationships that drive drug discovery insight. Specimen-level GUID linkage is not a data management nicety; it is a structural requirement for generating training data that produces high-value model outputs.

What multi-omics looks like in practice: finding an anticancer scaffold

Suppose a pharmaceutical researcher wants to identify new alkaloid scaffolds with potential anticancer activity from Madagascar's Apocynaceae family — which contains the same plant family as the rosy periwinkle that yielded vincristine.

With single-layer data (genome only, or metabolomics only), the pathway looks like this: sequence genomes, computationally predict which biosynthetic gene clusters might produce alkaloids, synthesise predicted compounds, test against cancer cell lines. Hit rates are low; most predictions are wrong; synthetic chemistry is expensive.

With specimen-level multi-omics, the pathway is fundamentally different. NIR field fingerprints identify which specimens in a population have the most unusual alkaloid chemical profiles. Targeted LC-MS/MS confirms the structural identity and concentration of specific compounds in those specimens. Transcriptomics on the same specimens maps which gene clusters are active and producing the target compounds in those specific individuals. The genome assembly provides the full biosynthetic pathway and enables CRISPR-ready annotations for heterologous expression. Soil chemistry explains whether the interesting chemistry is driven by unusual edaphic conditions — allowing targeted re-collection from similar environments.

The result is not a compound — it is a verified biosynthetic pathway linked to a confirmed bioactive compound, with the gene targets already identified and the environmental conditions that drive production documented. That is a commercially actionable drug discovery package, not a hypothesis.

Multi-omics for agritech: stress tolerance and genomic selection

The same logic applies in crop science. A drought-tolerance gene identified in a Malagasy spiny desert specialist is only commercially useful if you understand not just which gene it is, but in what tissues it is expressed under which stress conditions, what metabolic pathway it operates within, what the phenotypic outcome is, and whether it transfers functional performance when introgressed into a crop background.

Specimen-level multi-omics — where genome, stress-induced transcriptome, metabolite profiles, phenotypic measurements, and soil conditions are all recorded for the same individual — provides the data package that makes a gene functionally characterised, not merely identified. That distinction, from discovery to characterisation, is where most of the commercial value in agricultural genomics lies.

70% Of approved small-molecule drugs derived from or inspired by natural products (Newman & Cragg, 2020)
8 Integrated data layers per specimen in the IsoGentiX platform
10,000+ Target endemic species over 5 years
1 GUID linking every layer to the same individual specimen