Skip to content

VCFX_reformatter

Overview

VCFX_reformatter is a tool for reformatting INFO and FORMAT fields in VCF files. It provides functionality to compress (remove) specific fields and reorder fields in both INFO and FORMAT columns, making VCF files more organized and efficient.

Usage

VCFX_reformatter [options] < input.vcf > output.vcf

Options

Option Description
-h, --help Display help message and exit (handled by vcfx::handle_common_flags)
-v, --version Show program version and exit (handled by vcfx::handle_common_flags)
-c, --compress-info <keys> Remove specified INFO keys (comma-separated)
-f, --compress-format <keys> Remove specified FORMAT keys (comma-separated)
-i, --reorder-info <keys> Reorder INFO keys (comma-separated)
-o, --reorder-format <keys> Reorder FORMAT keys (comma-separated)

Description

VCFX_reformatter modifies VCF files in several ways:

  1. INFO Field Compression:
  2. Removes specified keys from the semicolon-separated INFO field
  3. Preserves remaining fields in their original order
  4. Handles both key-value pairs and flag fields

  5. FORMAT Field Compression:

  6. Removes specified keys from the colon-separated FORMAT field
  7. Updates all sample columns to match the new FORMAT structure
  8. Maintains data consistency across all samples

  9. INFO Field Reordering:

  10. Places specified keys at the beginning of the INFO field
  11. Appends remaining keys in their original order
  12. Preserves all key-value pairs and flags

  13. FORMAT Field Reordering:

  14. Reorders the FORMAT column keys
  15. Updates all sample columns to match the new order
  16. Maintains data alignment across all samples

Input Requirements

  • Input must be a valid VCF file
  • File can be piped through stdin
  • Supports both VCFv4.0 and VCFv4.2 formats
  • Handles both single-sample and multi-sample VCFs

Output Format

The output is a VCF file with: - All header lines preserved - Modified INFO and FORMAT fields according to specifications - Updated sample columns to match new FORMAT structure - Original VCF format maintained

Examples

Basic Usage

Remove specific INFO fields and reorder others:

VCFX_reformatter --compress-info AF,DP --reorder-info AF,DP < input.vcf > output.vcf

Format Field Manipulation

Remove and reorder FORMAT fields:

VCFX_reformatter --compress-format PL,AD --reorder-format GT,DP < input.vcf > output.vcf

Combined Operations

Perform multiple operations in one command:

VCFX_reformatter \
  --compress-info AF,DP \
  --compress-format PL,AD \
  --reorder-info AF,DP \
  --reorder-format GT,DP \
  < input.vcf > output.vcf

Integration with Other Tools

Combine with other VCFX tools:

cat input.vcf | \
  VCFX_validator | \
  VCFX_reformatter --compress-info AF,DP | \
  VCFX_metadata_summarizer

Field Handling

INFO Field Processing

  • Handles key-value pairs (e.g., "DP=10")
  • Handles flag fields (e.g., "PASS")
  • Preserves field separators
  • Maintains field order when specified

FORMAT Field Processing

  • Updates FORMAT column structure
  • Modifies all sample columns accordingly
  • Preserves data alignment
  • Handles missing values (".")

Error Handling

The tool handles various error conditions: - Malformed VCF lines - Missing fields - Invalid field formats - Inconsistent sample data - Lines with fewer than 8 columns

Performance Considerations

  • Processes input streamingly
  • Efficient memory usage
  • Handles large files
  • Preserves original data integrity

Limitations

  • Only modifies INFO and FORMAT fields
  • Does not validate VCF format (use VCFX_validator for validation)
  • Does not modify other VCF columns
  • Requires at least 8 columns in data lines

Common Use Cases

  1. Removing unnecessary fields to reduce file size
  2. Reordering fields for better readability
  3. Standardizing VCF format across different sources
  4. Preparing VCF files for specific analysis tools
  5. Cleaning up VCF files before processing

Best Practices

  1. Validate input VCF before reformatting
  2. Back up original files before modification
  3. Verify output format meets requirements
  4. Use appropriate field combinations
  5. Document field modifications