Skip to main content

How It Works Under the Hood

The Science Behind
Your Atlagene Report

We don't make claims we can't source. Every variant we report on is backed by peer-reviewed research, classified against established databases, and reviewed by geneticists before reaching your dashboard.

The Pipeline

Upload to Insight in 6 Steps

  1. 1

    Parse

    Auto-detect file format (23andMe, Ancestry, MyHeritage, VCF, WGS) and parse genotype calls. We verify positions against GRCh37/GRCh38 references.

  2. 2

    Normalize

    Variants are LIST-partitioned by chromosome and indexed by rsID. WGS files get reference-allele backfill so coverage scores are accurate (homozygous-ref positions count as analyzed, not missing).

  3. 3

    Annotate

    Each variant is cross-referenced against ClinVar (2.65M annotations), GWAS Catalog (312K associations), PharmGKB (2.8K drug-gene pairs), gnomAD (population frequencies), and REVEL (77.5M pathogenicity scores).

  4. 4

    Classify

    Risk scoring uses evidence-weighted polygenic models per category. VUS variants get an XGBoost classifier (AUC 0.80, retraining to ~0.84 with CADD/SpliceAI). Pharmacogenomic calls follow CPIC guidelines.

  5. 5

    Review

    Geneticist-curated variant registry — every entry has lifecycle status (proposed → reviewed → approved). Auto-discovered variants from GWAS/ClinVar updates go to the review queue, not directly to users.

  6. 6

    Deliver

    Results appear on your dashboard with disclaimers. Physician-flag variants (high-penetrance pathogenic) trigger an optional review. Helix AI explains findings without diagnosing.

Evidence Sources

Where Our Calls Come From

Every variant we report on cites at least one of these. No proprietary "secret sauce."

ClinVar

2.65M annotations

NCBI's public archive of variant-condition relationships, with clinical significance ratings (pathogenic, likely pathogenic, uncertain, likely benign, benign).

GWAS Catalog

312K associations

EBI's curated database of trait-associated SNPs from genome-wide association studies, weighted by effect size and study sample size.

PharmGKB

2.8K drug-gene pairs

Pharmacogenomics knowledge base. CPIC level A/B guidelines drive our 200+ drug interactions.

REVEL

77.5M scores

Ensemble missense pathogenicity predictor. Used for variant effect scoring; CADD and SpliceAI integration is queued for VUS classifier retraining.

gnomAD

Population frequencies

Allele frequency data across major populations — used to calibrate risk scores and reduce false-positive findings on common variants.

Continuous updates

Weekly + bi-weekly

ClinVar weekly, GWAS Catalog weekly, PharmGKB bi-weekly. Reclassifications trigger user-facing alerts when applicable.

Variant Discovery

Living Registry, Not a Static List

Our variant registry is database-driven. When ClinVar releases a weekly update, our automated discovery engine scans for newly-significant variants, scores the evidence on a 5-factor rubric (clinical significance, study count, population coverage, effect size, gene-disease association), and drafts a phenotype description using Claude.

Drafted variants land in variant_suggestions for geneticist review. Nothing reaches a user's dashboard until a credentialed reviewer approves it. Reclassifications trigger before/after audit logs and (where applicable) user-facing alerts.

See the public registry at /variants.

What Atlagene Does NOT Do

Honesty about scope is a clinical safety issue.

  • We do not diagnose. Period. Findings get disclaimers; physician review is the paid product.
  • We do not prescribe medications. Pharmacogenomics output is informational; your prescriber decides.
  • We do not treat your genome as deterministic. Polygenic risk is probabilistic; lifestyle and environment matter.
  • We do not sell or share your genetic data. Ever. (See Privacy Policy.)
  • We do not handle ancestry research like 23andMe — we focus on health analysis. Ancestry composition is an included extra, not the headline product.

See It Yourself

Browse the public variant registry to see exactly what we measure and the evidence behind each call.