Genomics & scientific data

Beat the standard genomics format — 2.48× smaller than BCF, byte-for-byte exact

On a population 1000 Genomes chr22 slice, AT-1 compresses VCF 209× vs raw and 2.46× smaller than native BCF (2.96× vs bgzipped VCF), with exact byte-for-byte round-trip — a hard requirement for regulated genomic data — and region queries via positional zone maps.

Who it's for: For sequencing centres, precision-medicine, and agri/pathogen-genomics teams holding VCF at scale, under regulation.

209×
VCF vs raw
2.46×
vs native BCF
2.96×
vs .vcf.gz
byte-exact + SHA-256

How it fits: Drop AT-1 in as the cold-storage tier for VCF archives and inter-institution transfer; the embedded SHA-256 integrity trailer attests the reconstruction equals the original.

Self-verified with bcftools 1.17 / htslib 1.17 on real 1000 Genomes chr22 data; see the genomics doc for the exact commands. Honest boundary:the headline 2.46× holds for sparse population/cohort genotypes; dense clinical FORMAT fields (GT:AD:DP:GQ:PL) narrow the win to ~1.30× vs BCF via a graceful lossless fallback — still smaller, with byte-exact attestation BCF doesn't carry.

Put a number on your data.

Enter your storage bill; see the saving and a one-page proposal in 30 seconds.