VCFX_merger¶

Overview¶

VCFX_merger is a tool for merging multiple VCF files by variant position. It combines multiple VCF files while maintaining proper sorting by chromosome and position, and preserves the VCF header information from the first input file.

Usage¶

VCFX_merger --merge file1.vcf,file2.vcf,... [options] > merged.vcf

Options¶

Option	Description
`-m, --merge`	Comma-separated list of VCF files to merge
`-h, --help`	Display help message and exit (handled by `vcfx::handle_common_flags`)
`-v`, `--version`	Show program version and exit (handled by `vcfx::handle_common_flags`)

Description¶

VCFX_merger reads multiple VCF files and combines them into a single output file while:

Preserving the VCF header information from the first input file
Sorting all variants by chromosome and position
Maintaining the original VCF format and field structure
Handling multiple input files efficiently

The tool processes the input files sequentially and merges all variants while ensuring proper sorting. It is particularly useful when you need to combine multiple VCF files from different sources or samples into a single, properly sorted VCF file.

Input Requirements¶

All input files must be in valid VCF format
Files should have consistent header structures
The first file's header information will be used in the output
Files can contain any number of variants

Output Format¶

The output is a standard VCF file with: - Header information from the first input file - All variants sorted by chromosome and position - Original VCF format preserved - Tab-delimited fields maintained

Examples¶

Basic Usage¶

Merge two VCF files:

VCFX_merger --merge sample1.vcf,sample2.vcf > merged.vcf

Multiple Files¶

Merge three or more VCF files:

VCFX_merger --merge file1.vcf,file2.vcf,file3.vcf > combined.vcf

Integration with Other Tools¶

Merge files and then process the result:

VCFX_merger --merge sample1.vcf,sample2.vcf | VCFX_sorter | VCFX_validator > final.vcf

Error Handling¶

The tool handles various error conditions:

Missing input files: Reports an error if any specified input file cannot be opened
Invalid VCF format: Preserves the original format but does not validate it
Empty files: Handles empty input files gracefully
Missing --merge argument: Displays help message

Performance Considerations¶

Memory usage scales with the number of variants across all input files
Processing time depends on the total number of variants and the number of input files
Files are processed sequentially to minimize memory usage
Sorting is performed in memory after all variants are collected

Limitations¶

Only supports standard VCF format files
Does not perform VCF validation (use VCFX_validator for validation)
Preserves only the header information from the first input file
Requires all input files to have consistent field structures

Common Use Cases¶

Combining multiple sample VCFs into a single file
Merging region-specific VCF files
Combining results from different variant callers
Creating a unified VCF file from multiple analysis runs

Best Practices¶

Validate input files before merging
Use consistent VCF versions across input files
Consider file sizes and available memory when merging many files
Verify the output with VCFX_validator after merging
Use VCFX_sorter if additional sorting is needed