VCFX_phred_filter¶
Overview¶
VCFX_phred_filter filters a VCF file based on the PHRED quality scores (QUAL column), removing variants that fall below a specified quality threshold to focus analysis on higher confidence variant calls.
Usage¶
VCFX_phred_filter [OPTIONS] < input.vcf > filtered.vcf
Options¶
Option | Description |
---|---|
-p , --phred-filter |
Set PHRED quality score threshold (default: 30.0) |
-k , --keep-missing-qual |
Keep variants with missing quality values (represented as ".") |
-h , --help |
Display help message and exit (handled by vcfx::handle_common_flags ) |
-v , --version |
Show program version and exit (handled by vcfx::handle_common_flags ) |
Description¶
VCFX_phred_filter examines the QUAL column of each variant record in a VCF file and filters out variants with quality scores below a specified threshold. The tool:
- Processes a VCF file line by line
- Extracts the PHRED quality score from the QUAL column (position 6) of each variant
- Compares the quality score against the specified threshold
- Retains variants where the quality meets or exceeds the threshold
- Passes through all header lines unchanged
This tool is particularly useful for: - Removing low-confidence variant calls for downstream analysis - Quality control of variant data - Reducing false positives in variant datasets - Applying consistent quality standards across multiple VCF files
Output Format¶
The output is a standard VCF file with the same format as the input, but containing only variants with PHRED quality scores at or above the specified threshold. All header lines are preserved.
Examples¶
Basic Usage with Default Threshold¶
# Filter variants using the default quality threshold (30)
VCFX_phred_filter < input.vcf > filtered.vcf
Setting a Custom Threshold¶
# Keep only variants with quality scores of 20 or higher
VCFX_phred_filter -p 20 < input.vcf > q20_filtered.vcf
Keeping Variants with Missing Quality¶
# Keep variants with quality ≥ 30 or missing quality values
VCFX_phred_filter -p 30 -k < input.vcf > high_quality_with_missing.vcf
In a Pipeline¶
# Combine with other filters in a pipeline
cat input.vcf | \
VCFX_phred_filter -p 40 | \
grep "PASS" > high_quality_pass.vcf
Using Long Option Format¶
# Using long options for clarity
VCFX_phred_filter --phred-filter 25 --keep-missing-qual < input.vcf > filtered.vcf
Quality Score Handling¶
The tool processes quality scores as follows:
- Standard numeric values: Directly compared to the threshold
- Missing quality values (represented as "."):
- By default, treated as 0.0 (filtered out)
- With the
-k
option, treated as extremely high (effectively infinity) to ensure they pass - Invalid quality values: Logged with a warning and treated as 0.0 (filtered out)
Handling Special Cases¶
- Missing quality values: Can be retained with the
-k
option - Invalid quality formats: Treated as 0.0 with a warning
- Empty lines: Preserved with a single newline
- Header lines: Preserved unchanged
- Malformed VCF lines: Lines with fewer than 6 columns are skipped with a warning
- Data before header: Skipped with a warning
Performance¶
The tool is designed for efficiency:
- Processes VCF files line by line, with minimal memory requirements
- Simple numeric comparison for fast filtering decisions
- No requirement to load the entire file into memory
- Fast string parsing for quality values
Limitations¶
- Focuses only on the QUAL column, not on per-sample or per-genotype quality metrics
- No support for filtering based on other quality-related fields (e.g., GQ in the genotype fields)
- Cannot apply different thresholds to different variant types
- No option to exclude variants based on upper quality bounds
- Doesn't provide statistics on filtered variants
- Cannot filter based on multiple quality metrics simultaneously
- Lacks region-specific filtering capabilities