VCFX_impact_filter¶
Overview¶
VCFX_impact_filter filters VCF variants based on their predicted functional impact level (HIGH, MODERATE, LOW, or MODIFIER) found in the INFO field of VCF records.
Usage¶
VCFX_impact_filter --filter-impact <LEVEL> < input.vcf > filtered.vcf
Options¶
Option | Description |
---|---|
-i , --filter-impact <LEVEL> |
Required. Impact level threshold. Must be one of: HIGH, MODERATE, LOW, MODIFIER |
-h , --help |
Display help message and exit (handled by vcfx::handle_common_flags ) |
-v , --version |
Show program version and exit (handled by vcfx::handle_common_flags ) |
Description¶
VCFX_impact_filter analyzes variant annotations in a VCF file and filters them based on a specified impact level threshold. The tool:
- Reads a VCF file line-by-line
- For each variant, extracts the impact level from the INFO field (looks for
IMPACT=...
) - Classifies the impact into one of four levels: HIGH, MODERATE, LOW, or MODIFIER
- Keeps only variants with impact level greater than or equal to the specified threshold
- Adds an
EXTRACTED_IMPACT
field to the INFO column of retained variants - Outputs the filtered VCF with the same format as the input
The impact level hierarchy used for filtering is: HIGH > MODERATE > LOW > MODIFIER > UNKNOWN
Output Format¶
The output is a valid VCF file containing only variants that meet or exceed the specified impact threshold. Each retained variant will have an additional INFO field:
EXTRACTED_IMPACT=<value>
Where <value>
is the original impact value extracted from the INFO field.
Examples¶
Filter for HIGH impact variants only¶
./VCFX_impact_filter --filter-impact HIGH < input.vcf > high_impact_variants.vcf
Filter for MODERATE or higher impact variants¶
./VCFX_impact_filter --filter-impact MODERATE < input.vcf > functional_variants.vcf
Combining with other tools¶
# Filter by impact then convert to another format
./VCFX_impact_filter --filter-impact HIGH < input.vcf | \
./VCFX_format_converter --format=bed > high_impact.bed
Handling Special Cases¶
- Case insensitivity: Impact values are case insensitive (e.g., "high" and "HIGH" are treated the same)
- Extended impact values: Values like "HIGH_MISSENSE" are recognized by looking for the presence of standard impact keywords
- Missing IMPACT field: Variants without an IMPACT field in the INFO column are treated as "UNKNOWN" and filtered out by default
- Empty INFO field: Properly handled by adding the EXTRACTED_IMPACT field as the only INFO attribute
- Multiple impact annotations: If multiple IMPACT fields are present, only the first one is considered
- Invalid impact values: Any impact value not recognized as one of the four standard levels is classified as "UNKNOWN"
Performance¶
The tool is optimized for efficiency: - Processes VCF files line-by-line with minimal memory overhead - Uses regular expressions for reliable pattern matching - Processes very large VCF files with linear time complexity
Limitations¶
- Only extracts and analyzes the first IMPACT field found in the INFO column
- Cannot differentiate between more detailed impact subclassifications (relies on basic HIGH/MODERATE/LOW/MODIFIER keywords)
- Assumes that functional impact annotations follow standard convention with one of the four recognized impact levels
- Does not account for the specific variant type (SNP, indel, etc.) when filtering
- No built-in options to combine impact filtering with other criteria (e.g., allele frequency)