AlphaGenome: Reading the Genome's Regulatory Code

Contents

Overview — What Is AlphaGenome?
How It Works
Key Functions & Capabilities
Who Uses It?
How to Access & Use
Recent Advances

Overview

What Is AlphaGenome?

AlphaGenome is a large-scale deep learning model from Google DeepMind that predicts how segments of DNA regulate gene expression — and how genetic variants may disrupt that regulation.

When the human genome was first sequenced, around 98% of it — everything that doesn't encode proteins — seemed inert. It isn't. This "non-coding" DNA contains enhancers, silencers, and other regulatory elements that control where, when, and how much each gene is expressed. Deciphering this regulatory code is one of biology's central unsolved problems, and one with major implications for understanding disease.

AlphaGenome is designed to crack that code. Given up to 1 million base pairs of raw DNA sequence, the model outputs thousands of functional genomic tracks — predictions of gene expression levels, chromatin accessibility, histone modifications, transcription factor binding patterns, splice site usage, and more — at single-nucleotide resolution.

What AlphaGenome is not: It is not a clinical diagnostic tool, a FASTQ/BAM file processing pipeline, or a personal genome interpretation service. It is a research-grade sequence-to-function prediction model for characterizing regulatory variation. It is not validated for clinical use.

1 Mb

DNA input context window

biological modalities predicted

25/26

variant effect benchmarks matched or exceeded

Architecture

How It Works

Prior regulatory genomics models faced a fundamental trade-off: short-context models (like SpliceAI or BPNet) achieved base-pair resolution but missed distal regulatory elements beyond ~10 kb. Long-context models (like Enformer or Borzoi) captured broader sequence context — up to 500 kb — but at reduced output resolution (32–128 bp bins), blurring fine-scale features like splice sites and transcription factor footprints.

AlphaGenome resolves both constraints in a single architecture.

Model architecture — U-Net-style encoder-transformer-decoder

Input

1 Mb DNA sequence

Human or mouse genome; split into 8 parallel chunks for TPU processing

Encoder

Convolutional downsampling

7 stages detect motifs and compress sequence into multi-scale representations

Transformer

Inter-device attention

Captures long-range interactions across the full 1 Mb context via cross-TPU communication

Decoder

Upsampling to output heads

Task-specific heads generate predictions at 1 bp, 128 bp, and 2,048 bp resolution

The model produces two types of learned representations: one-dimensional embeddings at 1 bp and 128 bp resolutions, and two-dimensional pairwise embeddings at 2,048 bp resolution (critical for predicting splice donor-acceptor interactions). These feed into 11 sets of task-specific output heads.

Training used a two-stage process: first, an ensemble of cross-validation "teacher" models trained on held-out genome folds; then a single "student" model distilled from the ensemble, yielding improved robustness and variant effect prediction accuracy in a single device call per variant.

AlphaGenome was trained on human and mouse genomes. The final model predicts 5,930 human or 1,128 mouse genome tracks across diverse cell types and tissues.

Capabilities

Key Functions & Capabilities

AlphaGenome's core function is predicting how a DNA sequence drives regulatory biology. Its 11 predicted modalities span the major layers of gene regulation:

Gene expression (RNA-seq, CAGE, PRO-cap)

Chromatin accessibility (DNase, ATAC-seq)

Histone modifications

Transcription factor binding

Chromatin contact maps (3D genome)

Splice site usage

Splice junction coordinates & strength

Transcription initiation

Polyadenylation signals

TF footprints

Variant effect scoring (all modalities)

Beyond genome track prediction, AlphaGenome is specifically designed for variant effect prediction (VEP): comparing model outputs for a reference sequence versus a mutated sequence to predict the functional consequence of a genetic variant. A single variant can be scored across all modalities in under a second.

The model includes a genome interpretation suite with contribution scores from in silico mutagenesis (ISM) experiments, allowing researchers to identify which nucleotides in a sequence are most critical for a predicted regulatory output.

In a key validation experiment, AlphaGenome successfully recapitulated the known disease mechanism in T-cell acute lymphoblastic leukemia (T-ALL): a non-coding mutation activates the TAL1 oncogene by creating a MYB transcription factor binding motif. The model predicted this mechanism without being told the gene or disease context — from DNA sequence alone.

Known limitations: AlphaGenome performs best within its 1 Mb context window — performance declines for regulatory interactions spanning more than ~100,000 bp. It does not model disease traits governed by higher-order biological factors (protein interactions, signaling networks, environment). It is trained on human and mouse genomes; performance on other species is not characterized.

Audience

Who Uses AlphaGenome?

Since its June 2025 launch, nearly 3,000 researchers from 160 countries have used AlphaGenome, generating approximately 1 million API calls per day by January 2026. Its primary audience is academic and translational researchers working on regulatory genomics problems.

Cancer Genomics

Identifying non-coding mutations in cancer genomes that drive tumor proliferation by activating or silencing regulatory elements — without requiring protein-coding impact.

Rare Disease & Genetics

Predicting variant effects on RNA splicing for diseases like spinal muscular atrophy (SMA) and cystic fibrosis, where splicing mutations in non-coding regions are pathogenic.

Functional Genomics

Mapping and characterizing the regulatory architecture of the genome — finding enhancers, promoters, silencers — across cell types and developmental contexts.

Drug Target Discovery

Understanding how non-coding variants associated with disease (from GWAS) affect regulatory elements and gene expression, prioritizing mechanistic hypotheses for therapeutic intervention.

Neurodegenerative Disease

Investigating regulatory variation contributing to Alzheimer's, Parkinson's, and related disorders, where non-coding loci constitute a large fraction of GWAS signals.

Computational Biology

Fine-tuning and adapting the model for domain-specific tasks — AlphaGenome's learned sequence representations serve as a powerful general-purpose genomic foundation.

Access

How to Access & Use AlphaGenome

AlphaGenome is available for non-commercial research use at no cost. As of January 2026, source code and model weights have been made publicly available.

1
Review terms of use
Access is free for non-commercial research. Review DeepMind's terms at deepmind.google.com/science/alphagenome before proceeding.
2
API access (server-side inference)
Use the Python SDK to query DeepMind's hosted API. The SDK handles sequence formatting, request batching, and result parsing. Suitable for most research workflows without local GPU requirements.
3
Source code & weights (local deployment)
Since January 2026, model weights and source code are available at github.com/google-deepmind/alphagenome_research. Enables fine-tuning on domain-specific datasets and offline inference. TPU or high-memory GPU recommended for the full 1 Mb context.
4
Genome interpretation suite
An accompanying analysis toolkit provides streamlined variant scoring with quantile calibration, ISM-based contribution score computation, and genome track visualization. Useful for interpreting model outputs in a biological context.
5
Hugging Face model hub
The google/alphagenome-all-folds model is also hosted on Hugging Face, providing an alternative access path compatible with the transformers ecosystem.

Input format: AlphaGenome takes raw DNA sequence (A/C/G/T nucleotides) up to 1 million base pairs, along with a species identifier (human or mouse). It does not process FASTQ, BAM, VCF, or other sequencing file formats directly — variant scoring requires extracting the reference and alternate sequences for the genomic region of interest and passing them as raw strings.

Timeline

Recent Advances

AlphaGenome has moved rapidly from preprint to open-source availability over an eight-month period.

Jun 2025
Public preview and preprint. DeepMind released AlphaGenome in preview for non-commercial research alongside a bioRxiv preprint (DOI: 10.1101/2025.06.25.661532). Free API access launched for researchers globally.
Jun–Jan
Rapid adoption. Within seven months of launch, nearly 3,000 researchers from 160 countries began using the model, with API call volume reaching approximately 1 million requests per day.
Jan 2026
Nature publication. The peer-reviewed paper — "Advancing regulatory variant effect prediction with AlphaGenome" — was published in Nature (DOI: 10.1038/s41586-025-10014-0), confirming performance across 25 of 26 variant effect prediction benchmarks.
Jan 2026
Open-source release. DeepMind released model source code, weights, and variant scoring implementations publicly at github.com/google-deepmind/alphagenome_research, expanding access beyond the hosted API.
Roadmap
Future directions. The DeepMind team has indicated planned work on expanded tissue-specific prediction capacity, training on additional species beyond human and mouse, and community fine-tuning for domain-specific tasks.

AlphaGenome: Reading the Genome's Regulatory Code

AlphaGenome: Reading the Genome's Regulatory Code

What Is AlphaGenome?

How It Works

Key Functions & Capabilities

Who Uses AlphaGenome?

How to Access & Use AlphaGenome

Recent Advances

Rate this Article

Comments (0)

About the Author

Tags

Article Stats