The single most important thing to understand about multi-omics
Individual omics data types — a genome sequence, a metabolite profile, a transcriptome — are useful in isolation. But the questions that matter most in drug discovery and crop science cannot be answered by a single data type. They require knowing how a biological system works across multiple levels simultaneously.
Why does this plant produce compound X in its leaves but not its roots? Which gene family is responsible for its unusual alkaloid chemistry? How does its biosynthetic pathway respond when the plant is under water stress? Those questions require, at minimum, a genome (what genes exist), a transcriptome (which genes are switched on and in what tissues), and a metabolome (what compounds are actually being produced).
Multi-omics is the practice of generating and integrating all of those data types from the same specimen — so that every inference you draw is anchored in the same biological individual.
The eight layers and what each one contributes
| Data layer | What it measures | What it enables |
|---|---|---|
| Reference Genome Assembly | The complete DNA sequence of the organism, assembled to chromosome level (EBP standard: Merqury QV ≥40, BUSCO ≥90%) | Gene discovery; biosynthetic pathway mapping; evolutionary comparison; CRISPR target identification |
| Transcriptomics | Tissue-specific RNA-seq — which genes are actively expressed in leaf, root, flower, and under what conditions | Identifies which genes are producing the chemistry you observe; maps biosynthetic gene clusters to metabolite output |
| NIR Metabolite Fingerprints | Near-infrared spectroscopy collected in the field for every specimen — a rapid whole-metabolome fingerprint | Chemical triage at scale: identifies chemically interesting specimens before lab analysis; enables population-level metabolomics |
| Targeted LC-MS/MS Metabolomics | High-resolution mass spectrometry for alkaloid, terpenoid, and flavonoid compound identification and quantification | Structural identification of bioactive compounds; concentration profiles; novel scaffold discovery |
| Phenotypic Traits | Morphological and phenological data in Darwin Core schema — leaf morphology, growth habit, reproductive timing | Links genotype to observable phenotype; enables genomic selection for trait mapping in agritech |
| Soil & Substrate Chemistry | XRF elemental analysis of collection site soil (30+ elements); substrate type, pH, moisture | Contextualises metabolite and stress-response profiles; enables edaphic adaptation gene discovery |
| Cryopreserved Germplasm | Living biological material stored at −80°C and −196°C liquid nitrogen | Enables future wet-lab validation; gene expression studies; synthetic biology; provides physical chain-of-custody for provenance |
| AI-Ready Metadata & Provenance | FAIR-compliant metadata; blockchain-verified collection provenance; SHA-256 data integrity; steganographic watermarks | Legal defensibility of commercial use; chain-of-custody for Nagoya compliance; data integrity certification |
Why genome sequence alone is not enough
A genome assembly tells you what genes a plant has. It does not tell you which genes are producing the compounds you are interested in. It does not tell you whether those genes are expressed in the tissue type you would harvest commercially. It does not tell you whether expression is constitutive or stress-induced. And it does not confirm that the compound your biosynthetic pathway predicts is actually present in the living organism at detectable concentrations.
This gap between genomic prediction and metabolic reality is a well-documented problem in natural product drug discovery. Many biosynthetic gene clusters that look promising in genome analysis produce trace quantities of target metabolites, or produce them only under specific environmental conditions, or produce related but structurally different compounds from the ones predicted.
Transcriptomics bridges the gap between genome and metabolome by telling you which genes are actually switched on in which tissues. Targeted metabolomics confirms what compounds are present and at what concentrations. Soil chemistry explains why the same species growing in different locations may produce completely different chemical profiles.
Specimen-level integration: why it matters
Multi-omics data that is not integrated at specimen level — where genome data is from one individual, transcriptomics from another, and metabolomics from a third — produces correlations that may be artefactual. Natural populations of plants show significant individual variation. If you are drawing inferences about the relationship between a gene and a compound, those inferences are only as valid as the individual-level linkage in your data.
Specimen-level integration means that every data layer for a given individual is tagged with the same Globally Unique Identifier (GUID). When you query the relationship between a transcriptomic profile and a metabolite concentration, you are asking about the same individual organism, not an average across a population. This is the structural requirement for generating causal rather than merely correlational hypotheses — which is precisely what AI-driven drug discovery platforms need.
Large language models and graph neural networks trained on biological data learn relationships between features. If your genome data and metabolome data come from different individuals, the model learns population-level averages — which reduces its capacity to learn the precise mechanistic relationships that drive drug discovery insight. Specimen-level GUID linkage is not a data management nicety; it is a structural requirement for generating training data that produces high-value model outputs.
What multi-omics looks like in practice: finding an anticancer scaffold
Suppose a pharmaceutical researcher wants to identify new alkaloid scaffolds with potential anticancer activity from Madagascar's Apocynaceae family — which contains the same plant family as the rosy periwinkle that yielded vincristine.
With single-layer data (genome only, or metabolomics only), the pathway looks like this: sequence genomes, computationally predict which biosynthetic gene clusters might produce alkaloids, synthesise predicted compounds, test against cancer cell lines. Hit rates are low; most predictions are wrong; synthetic chemistry is expensive.
With specimen-level multi-omics, the pathway is fundamentally different. NIR field fingerprints identify which specimens in a population have the most unusual alkaloid chemical profiles. Targeted LC-MS/MS confirms the structural identity and concentration of specific compounds in those specimens. Transcriptomics on the same specimens maps which gene clusters are active and producing the target compounds in those specific individuals. The genome assembly provides the full biosynthetic pathway and enables CRISPR-ready annotations for heterologous expression. Soil chemistry explains whether the interesting chemistry is driven by unusual edaphic conditions — allowing targeted re-collection from similar environments.
The result is not a compound — it is a verified biosynthetic pathway linked to a confirmed bioactive compound, with the gene targets already identified and the environmental conditions that drive production documented. That is a commercially actionable drug discovery package, not a hypothesis.
Multi-omics for agritech: stress tolerance and genomic selection
The same logic applies in crop science. A drought-tolerance gene identified in a Malagasy spiny desert specialist is only commercially useful if you understand not just which gene it is, but in what tissues it is expressed under which stress conditions, what metabolic pathway it operates within, what the phenotypic outcome is, and whether it transfers functional performance when introgressed into a crop background.
Specimen-level multi-omics — where genome, stress-induced transcriptome, metabolite profiles, phenotypic measurements, and soil conditions are all recorded for the same individual — provides the data package that makes a gene functionally characterised, not merely identified. That distinction, from discovery to characterisation, is where most of the commercial value in agricultural genomics lies.