VCFX Tools Overview¶
VCFX is a collection of C/C++ tools for processing and analyzing VCF (Variant Call Format) files, with optional WebAssembly compatibility. Each tool is an independent command-line executable that can parse input from stdin
and write to stdout
, enabling flexible piping and integration into bioinformatics pipelines.
The suite also includes a convenience wrapper vcfx
so you can run commands as vcfx <subcommand>
. For example, vcfx variant_counter
is equivalent to running VCFX_variant_counter
. Use vcfx --list
or the alias vcfx list
to see available subcommands. To view Markdown documentation for a tool, run vcfx help <tool>
. All individual VCFX_*
binaries remain available if you prefer calling them directly.
Every tool now supports the common flags --help
and --version
for quick usage or version information.
Tool Categories¶
Data Analysis¶
These tools help extract statistical information and insights from variant data:
- VCFX_allele_freq_calc - Calculate allele frequencies
- VCFX_variant_classifier - Classify variants into SNP, INDEL, MNV, or STRUCTURAL
- VCFX_inbreeding_calculator - Calculate inbreeding coefficients
- VCFX_dosage_calculator - Calculate allele dosage from genotypes
- VCFX_hwe_tester - Test for Hardy-Weinberg equilibrium
- VCFX_distance_calculator - Calculate genetic distances between samples
- VCFX_allele_counter - Count alleles in VCF files
- VCFX_allele_balance_calc - Calculate allele balance metrics
- VCFX_variant_counter - Count variants in VCF files
- VCFX_ancestry_inferrer - Infer ancestry from genetic data
- VCFX_ancestry_assigner - Assign ancestry to samples
- VCFX_ld_calculator - Calculate pairwise linkage disequilibrium (r²) between variants
Data Filtering¶
Tools for selecting variants based on specific criteria:
- VCFX_phase_checker - Filter variants to keep only fully phased genotypes
- VCFX_phred_filter - Filter variants based on Phred-scaled quality scores
- VCFX_record_filter - Filter variants based on various VCF fields
- VCFX_gl_filter - Filter variants based on genotype likelihoods
- VCFX_allele_balance_filter - Filter variants based on allele balance
- VCFX_population_filter - Filter variants based on population statistics
- VCFX_probability_filter - Filter variants based on probability scores
- VCFX_nonref_filter - Filter to keep only non-reference variants
- VCFX_impact_filter - Filter variants based on predicted impact
- VCFX_phase_quality_filter - Filter variants based on phasing quality scores
- VCFX_region_subsampler - Filter variants based on genomic regions
Data Transformation¶
Tools for converting or reformatting VCF data:
- VCFX_multiallelic_splitter - Split multiallelic variants into biallelic records
- VCFX_sample_extractor - Extract specific samples from a VCF file
- VCFX_position_subsetter - Extract variants at specific positions
- VCFX_format_converter - Convert VCF files to other formats
- VCFX_genotype_query - Query specific genotype patterns
- VCFX_indel_normalizer - Normalize indel representations
- VCFX_sv_handler - Handle structural variants in VCF files
- VCFX_fasta_converter - Convert VCF files to FASTA format
- VCFX_sorter - Sort VCF files by position
- VCFX_af_subsetter - Extract variants based on allele frequency
- VCFX_reformatter - Reformat VCF files for better readability
Quality Control¶
Tools for validating and checking data quality:
- VCFX_concordance_checker - Check concordance between samples in a VCF file
- VCFX_missing_detector - Detect and report missing data
- VCFX_outlier_detector - Detect outlier samples or variants
- VCFX_alignment_checker - Check alignment of variants
- VCFX_cross_sample_concordance - Check concordance between samples
- VCFX_validator - Validate VCF format compliance
File Management¶
Tools for handling VCF files:
- VCFX_indexer - Create an index file for random access
- VCFX_file_splitter - Split VCF files into smaller chunks
- VCFX_compressor - Compress VCF files efficiently
- VCFX_diff_tool - Find differences between VCF files
- VCFX_subsampler - Subsample variants from a VCF file
- VCFX_duplicate_remover - Remove duplicate variants
- VCFX_merger - Merge multiple VCF files by position
Annotation and Reporting¶
Tools for annotating and extracting information from VCF files:
- VCFX_custom_annotator - Add custom annotations to VCF files
- VCFX_info_summarizer - Summarize INFO fields in VCF files
- VCFX_header_parser - Parse and extract information from VCF headers
- VCFX_annotation_extractor - Extract annotations from VCF files
- VCFX_ref_comparator - Compare variants against a reference genome
- VCFX_field_extractor - Extract specific fields from VCF files
- VCFX_info_aggregator - Aggregate INFO fields across variants
- VCFX_info_parser - Parse INFO fields in VCF files
- VCFX_metadata_summarizer - Summarize key metadata from VCF files
Data Processing¶
Tools for processing variants and samples:
- VCFX_missing_data_handler - Handle missing data in VCF files
- VCFX_quality_adjuster - Adjust quality scores in VCF files
- VCFX_haplotype_phaser - Phase haplotypes in VCF files
- VCFX_haplotype_extractor - Extract haplotype information
Common Usage Patterns¶
VCFX tools are designed to be combined in pipelines. Here are some common usage patterns:
Basic Filtering and Analysis¶
# Extract phased variants, filter by quality, and calculate allele frequencies
cat input.vcf | \
VCFX_phase_checker | \
VCFX_phred_filter --phred-filter 30 | \
VCFX_allele_freq_calc > result.tsv
Variant Classification and Filtering¶
# Classify variants and filter for SNPs with high quality
cat input.vcf | \
VCFX_variant_classifier --append-info | \
grep 'VCF_CLASS=SNP' | \
VCFX_phred_filter --phred-filter 30 > high_quality_snps.vcf
Sample Comparison¶
# Check concordance between two samples in a single VCF
cat input.vcf | VCFX_concordance_checker --samples "SAMPLE1 SAMPLE2" > concordance.tsv
Linkage Disequilibrium Analysis¶
# Calculate LD in a specific region after filtering for common variants
cat input.vcf | \
VCFX_af_subsetter --af-filter '0.05-1.0' | \
VCFX_ld_calculator --region chr1:10000-20000 > ld_matrix.txt
Normalization and Splitting¶
# Normalize indels and split multiallelic variants
cat input.vcf | \
VCFX_indel_normalizer | \
VCFX_multiallelic_splitter > normalized_biallelic.vcf
Population Analysis¶
# Extract population-specific VCFs and calculate allele frequencies
cat input.vcf | VCFX_population_filter --population EUR --pop-map pop_map.txt > eur.vcf
cat eur.vcf | VCFX_allele_freq_calc > eur_afs.tsv
Quality Control Pipeline¶
# Validate, classify, detect missing data, and filter by quality
cat input.vcf | \
VCFX_validator | \
VCFX_variant_classifier --append-info | \
VCFX_missing_detector --max-missing 0.1 | \
VCFX_phred_filter --phred-filter 20 > qc_passed.vcf