Skip to content

VCFX Tools Overview

VCFX is a collection of C/C++ tools for processing and analyzing VCF (Variant Call Format) files, with optional WebAssembly compatibility. Each tool is an independent command-line executable that can parse input from stdin and write to stdout, enabling flexible piping and integration into bioinformatics pipelines.

The suite also includes a convenience wrapper vcfx so you can run commands as vcfx <subcommand>. For example, vcfx variant_counter is equivalent to running VCFX_variant_counter. Use vcfx --list or the alias vcfx list to see available subcommands. To view Markdown documentation for a tool, run vcfx help <tool>. All individual VCFX_* binaries remain available if you prefer calling them directly. Every tool now supports the common flags --help and --version for quick usage or version information.

Tool Categories

Data Analysis

These tools help extract statistical information and insights from variant data:

Data Filtering

Tools for selecting variants based on specific criteria:

Data Transformation

Tools for converting or reformatting VCF data:

Quality Control

Tools for validating and checking data quality:

File Management

Tools for handling VCF files:

Annotation and Reporting

Tools for annotating and extracting information from VCF files:

Data Processing

Tools for processing variants and samples:

Common Usage Patterns

VCFX tools are designed to be combined in pipelines. Here are some common usage patterns:

Basic Filtering and Analysis

# Extract phased variants, filter by quality, and calculate allele frequencies
cat input.vcf | \
  VCFX_phase_checker | \
  VCFX_phred_filter --phred-filter 30 | \
  VCFX_allele_freq_calc > result.tsv

Variant Classification and Filtering

# Classify variants and filter for SNPs with high quality
cat input.vcf | \
  VCFX_variant_classifier --append-info | \
  grep 'VCF_CLASS=SNP' | \
  VCFX_phred_filter --phred-filter 30 > high_quality_snps.vcf

Sample Comparison

# Check concordance between two samples in a single VCF
cat input.vcf | VCFX_concordance_checker --samples "SAMPLE1 SAMPLE2" > concordance.tsv

Linkage Disequilibrium Analysis

# Calculate LD in a specific region after filtering for common variants
cat input.vcf | \
  VCFX_af_subsetter --af-filter '0.05-1.0' | \
  VCFX_ld_calculator --region chr1:10000-20000 > ld_matrix.txt

Normalization and Splitting

# Normalize indels and split multiallelic variants
cat input.vcf | \
  VCFX_indel_normalizer | \
  VCFX_multiallelic_splitter > normalized_biallelic.vcf

Population Analysis

# Extract population-specific VCFs and calculate allele frequencies
cat input.vcf | VCFX_population_filter --population EUR --pop-map pop_map.txt > eur.vcf
cat eur.vcf | VCFX_allele_freq_calc > eur_afs.tsv

Quality Control Pipeline

# Validate, classify, detect missing data, and filter by quality
cat input.vcf | \
  VCFX_validator | \
  VCFX_variant_classifier --append-info | \
  VCFX_missing_detector --max-missing 0.1 | \
  VCFX_phred_filter --phred-filter 20 > qc_passed.vcf