Skip to content

VCFX_impact_filter

Overview

VCFX_impact_filter filters VCF variants based on their predicted functional impact level (HIGH, MODERATE, LOW, or MODIFIER) found in the INFO field of VCF records.

Usage

# Using file input (recommended for large files - 10-20x faster)
VCFX_impact_filter --filter-impact <LEVEL> -I input.vcf > filtered.vcf

# Using stdin
VCFX_impact_filter --filter-impact <LEVEL> < input.vcf > filtered.vcf

Options

Option Description
-i, --filter-impact <LEVEL> Required. Impact level threshold. Must be one of: HIGH, MODERATE, LOW, MODIFIER
-I, --input FILE Input VCF file. Uses memory-mapped I/O for 10-20x faster processing
-q, --quiet Suppress warning messages
-h, --help Display help message and exit (handled by vcfx::handle_common_flags)
-v, --version Show program version and exit (handled by vcfx::handle_common_flags)

Description

VCFX_impact_filter analyzes variant annotations in a VCF file and filters them based on a specified impact level threshold. The tool:

  1. Reads a VCF file line-by-line
  2. For each variant, extracts the impact level from the INFO field (looks for IMPACT=...)
  3. Classifies the impact into one of four levels: HIGH, MODERATE, LOW, or MODIFIER
  4. Keeps only variants with impact level greater than or equal to the specified threshold
  5. Adds an EXTRACTED_IMPACT field to the INFO column of retained variants
  6. Outputs the filtered VCF with the same format as the input

The impact level hierarchy used for filtering is: HIGH > MODERATE > LOW > MODIFIER > UNKNOWN

Output Format

The output is a valid VCF file containing only variants that meet or exceed the specified impact threshold. Each retained variant will have an additional INFO field:

EXTRACTED_IMPACT=<value>

Where <value> is the original impact value extracted from the INFO field.

Examples

Filter for HIGH impact variants only

./VCFX_impact_filter --filter-impact HIGH < input.vcf > high_impact_variants.vcf

Filter for MODERATE or higher impact variants

./VCFX_impact_filter --filter-impact MODERATE < input.vcf > functional_variants.vcf

Combining with other tools

# Filter by impact then convert to another format
./VCFX_impact_filter --filter-impact HIGH < input.vcf | \
  ./VCFX_format_converter --format=bed > high_impact.bed

Handling Special Cases

  • Case insensitivity: Impact values are case insensitive (e.g., "high" and "HIGH" are treated the same)
  • Extended impact values: Values like "HIGH_MISSENSE" are recognized by looking for the presence of standard impact keywords
  • Missing IMPACT field: Variants without an IMPACT field in the INFO column are treated as "UNKNOWN" and filtered out by default
  • Empty INFO field: Properly handled by adding the EXTRACTED_IMPACT field as the only INFO attribute
  • Multiple impact annotations: If multiple IMPACT fields are present, only the first one is considered
  • Invalid impact values: Any impact value not recognized as one of the four standard levels is classified as "UNKNOWN"

Performance

The tool is optimized for efficiency: - Memory-mapped I/O: When using -I/--input, files are memory-mapped for 10-20x faster processing - SIMD acceleration: Uses AVX2/SSE2/NEON instructions for fast newline scanning - Zero-copy parsing: Uses string_view for minimal memory allocation - 1MB output buffering: Reduces system call overhead - Processes very large VCF files with linear time complexity

Limitations

  • Only extracts and analyzes the first IMPACT field found in the INFO column
  • Cannot differentiate between more detailed impact subclassifications (relies on basic HIGH/MODERATE/LOW/MODIFIER keywords)
  • Assumes that functional impact annotations follow standard convention with one of the four recognized impact levels
  • Does not account for the specific variant type (SNP, indel, etc.) when filtering
  • No built-in options to combine impact filtering with other criteria (e.g., allele frequency)