VCFX_metadata_summarizer¶
Overview¶
VCFX_metadata_summarizer
is a tool that analyzes and summarizes key metadata from a VCF file. It provides a comprehensive overview of the file's structure, including counts of contigs, INFO fields, FILTER fields, FORMAT fields, samples, and variants.
Usage¶
VCFX_metadata_summarizer [options] < input.vcf
Options¶
Option | Description |
---|---|
-h, --help |
Display help message and exit (handled by vcfx::handle_common_flags ) |
-v , --version |
Show program version and exit (handled by vcfx::handle_common_flags ) |
Description¶
VCFX_metadata_summarizer
reads a VCF file and generates a summary of its key structural elements:
- Counts unique contigs defined in the header
- Counts unique INFO fields
- Counts unique FILTER fields
- Counts unique FORMAT fields
- Counts the number of samples
- Counts the total number of variants
The tool processes the VCF file line by line, parsing both header metadata and variant records to build a complete summary of the file's structure.
Input Requirements¶
- Input must be a valid VCF file
- File can be piped through stdin
- Supports both VCFv4.0 and VCFv4.2 formats
- Handles both single-sample and multi-sample VCFs
Output Format¶
The output is a formatted text summary with the following structure:
VCF Metadata Summary:
---------------------
Number of unique contigs: <count>
Number of unique INFO fields: <count>
Number of unique FILTER fields: <count>
Number of unique FORMAT fields: <count>
Number of samples: <count>
Number of variants: <count>
Examples¶
Basic Usage¶
Summarize metadata from a VCF file:
VCFX_metadata_summarizer < input.vcf
Integration with Other Tools¶
Combine with other VCFX tools:
cat input.vcf | VCFX_validator | VCFX_metadata_summarizer
Example Output¶
For a minimal VCF file with one contig, one INFO field, and a single variant:
VCF Metadata Summary:
---------------------
Number of unique contigs: 1
Number of unique INFO fields: 1
Number of unique FILTER fields: 0
Number of unique FORMAT fields: 0
Number of samples: 0
Number of variants: 1
Header Parsing¶
The tool parses the following types of header lines:
- ##contig=<ID=...>
- Contig definitions
- ##INFO=<ID=...>
- INFO field definitions
- ##FILTER=<ID=...>
- FILTER field definitions
- ##FORMAT=<ID=...>
- FORMAT field definitions
- #CHROM...
- Column header line (for sample counting)
Error Handling¶
The tool handles various input scenarios: - Empty files - Files with no header - Files with no variants - Files with missing metadata fields - Files with inconsistent header structures
Performance Considerations¶
- Processes input streamingly
- Memory usage scales with the number of unique metadata fields
- Efficient for both small and large VCF files
- No need to load entire file into memory
Limitations¶
- Only counts presence of fields, not their values
- Does not validate field definitions
- Does not check for field consistency across variants
- Does not analyze variant content beyond counting
Common Use Cases¶
- Quick assessment of VCF file structure
- Quality control of VCF file completeness
- Verification of expected metadata presence
- Sample count verification
- Variant count verification
Best Practices¶
- Run on VCF files before processing
- Use in combination with VCFX_validator
- Check for expected field counts
- Verify sample counts match expectations
- Use as part of quality control pipelines