CLI#
FinaleToolkit is a package and standalone program to extract fragmentation features of cell-free DNA from paired-end sequencing data.
usage: finaletoolkit [-h] [-v]
{coverage,frag-length-bins,frag-length-intervals,cleavage-profile,wps,adjust-wps,delfi,delfi-gc-correct,end-motifs,interval-end-motifs,mds,interval-mds,filter-bam,agg-bw,gap-bed}
...
Named Arguments#
- -v, --version
show program’s version number and exit
Sub-commands#
coverage#
Calculates fragmentation coverage over intervals defined in a BED file based on alignment data from a BAM/SAM/CRAM/Fragment file.
finaletoolkit coverage [-h] [-o OUTPUT_FILE] [-s SCALE_FACTOR] [-q QUALITY_THRESHOLD] [-w WORKERS] [-v] input_file interval_file
Positional Arguments#
- input_file
Path to a BAM/SAM/CRAM/Fragment file containing fragment data.
- interval_file
Path to a BED file containing intervals to calculate coverage over.
Named Arguments#
- -o, --output_file
A BED file containing coverage values over the intervals specified in interval file.
Default: “-”
- -s, --scale-factor
Scale factor for coverage values.
Default: 1000000.0
- -q, --quality_threshold
Minimum mapping quality threshold.
Default: 30
- -w, --workers
Number of worker processes.
Default: 1
- -v, --verbose
Enable verbose mode to display detailed processing information.
Default: False
frag-length-bins#
Retrieves fragment lengths grouped in bins given a BAM/SAM/CRAM/Fragment file.
finaletoolkit frag-length-bins [-h] [-c CONTIG] [-S START] [-p {midpoint,any}] [-E STOP] [--bin-size BIN_SIZE] [-o OUTPUT_FILE] [--contig-by-contig]
[--histogram] [-q QUALITY_THRESHOLD] [-v]
input_file
Positional Arguments#
- input_file
Path to a BAM/SAM/CRAM/Fragment file containing fragment data.
Named Arguments#
- -c, --contig
Specify the contig or chromosome to select fragments from. (Required if using –start or –stop.)
- -S, --start
Specify the 0-based left-most coordinate of the interval to select fragments from. (Must also specify –contig.)
- -p, --intersect_policy
Possible choices: midpoint, any
Specifies what policy is used to include fragments in the given interval. See User Guide for more information.
Default: “midpoint”
- -E, --stop
Specify the 1-based right-most coordinate of the interval to select fragments from. (Must also specify –contig.)
- --bin-size
Specify the size of the bins to group fragment lengths into.
- -o, --output_file
A .TSV file containing containing fragment lengths binned according to the specified bin size.
Default: “-”
- --contig-by-contig
Placeholder, not implemented.
Default: False
- --histogram
Enable histogram mode to display histogram in terminal.
Default: False
- -q, --quality_threshold
Minimum mapping quality threshold.
Default: 30
- -v, --verbose
Enable verbose mode to display detailed processing information.
Default: 0
frag-length-intervals#
Retrieves fragment length summary statistics over intervals defined in a BED file based on alignment data from a BAM/SAM/CRAM/Fragment file.
finaletoolkit frag-length-intervals [-h] [-p {midpoint,any}] [-o OUTPUT_FILE] [-q QUALITY_THRESHOLD] [-w WORKERS] [-v] input_file interval_file
Positional Arguments#
- input_file
Path to a BAM/SAM/CRAM/Fragment file containing fragment data.
- interval_file
Path to a BED file containing intervals to retrieve fragment length summary statistics over.
Named Arguments#
- -p, --intersect_policy
Possible choices: midpoint, any
Specifies what policy is used to include fragments in the given interval. See User Guide for more information.
Default: “midpoint”
- -o, --output-file
A BED file containing fragment length summary statistics (mean, median, st. dev, min, max) over the intervals specified in the interval file.
Default: “-”
- -q, --quality-threshold
Minimum mapping quality threshold.
Default: 30
- -w, --workers
Number of worker processes.
Default: 1
- -v, --verbose
Enable verbose mode to display detailed processing information.
Default: 0
cleavage-profile#
Calculates cleavage proportion over intervals defined in a BED file based on alignment data from a BAM/SAM/CRAM/Fragment file.
finaletoolkit cleavage-profile [-h] [-o OUTPUT_FILE] [-lo FRACTION_LOW] [-hi FRACTION_HIGH] [-q QUALITY_THRESHOLD] [-l LEFT] [-r RIGHT] [-w WORKERS]
[-v]
input_file interval_file
Positional Arguments#
- input_file
Path to a BAM/SAM/CRAM/Fragment file containing fragment data.
- interval_file
Path to a BED file containing intervals to calculates cleavage proportion over.
Named Arguments#
- -o, --output_file
A bigWig file containing the cleavage proportion results over the intervals specified in interval file.
Default: “-”
- -lo, --fraction_low
Minimum length for a fragment to be included in cleavage proportion calculation.
Default: 120
- -hi, --fraction_high
Maximum length for a fragment to be included in cleavage proportion calculation.
Default: 180
- -q, --quality-threshold
Minimum mapping quality threshold.
Default: 20
- -l, --left
Number of base pairs to subtract from start coordinate to create interval. Useful when dealing with BED files with only CpG coordinates.
Default: 0
- -r, --right
Number of base pairs to add to stop coordinate to create interval. Useful when dealing with BED files with only CpG coordinates.
Default: 0
- -w, --workers
Number of worker processes.
Default: 1
- -v, --verbose
Enable verbose mode to display detailed processing information.
Default: 0
wps#
Calculates Windowed Protection Score (WPS) over intervals defined in a BED file based on alignment data from a BAM/SAM/CRAM/Fragment file.
finaletoolkit wps [-h] [-o OUTPUT_FILE] [-i INTERVAL_SIZE] [-W WINDOW_SIZE] [-lo FRACTION_LOW] [-hi FRACTION_HIGH] [-q QUALITY_THRESHOLD]
[-w WORKERS] [-v]
input_file site_bed
Positional Arguments#
- input_file
Path to a BAM/SAM/CRAM/Fragment file containing fragment data.
- site_bed
Path to a BED file containing intervals to calculate WPS over.
Named Arguments#
- -o, --output_file
A bigWig file containing the WPS results over the intervals specified in interval file.
Default: “-”
- -i, --interval_size
Size in bp of each interval in the interval file.
Default: 5000
- -W, --window_size
Size of the sliding window used to calculate WPS scores.
Default: 120
- -lo, --fraction_low
Minimum length for a fragment to be included in WPS calculation.
Default: 120
- -hi, --fraction_high
Maximum length for a fragment to be included in WPS calculation.
Default: 180
- -q, --quality_threshold
Minimum mapping quality threshold.
Default: 30
- -w, --workers
Number of worker processes.
Default: 1
- -v, --verbose
Enable verbose mode to display detailed processing information.
Default: 0
adjust-wps#
Adjusts raw Windowed Protection Score (WPS) by applying a median filter and Savitsky-Golay filter.
finaletoolkit adjust-wps [-h] [-o OUTPUT_FILE] [-i INTERVAL_SIZE] [-m MEDIAN_WINDOW_SIZE] [-s SAVGOL_WINDOW_SIZE] [-p SAVGOL_POLY_DEG] [-w WORKERS]
[--mean] [--subtract-edges] [-v]
input_file interval_file genome_file
Positional Arguments#
- input_file
A bigWig file containing the WPS results over the intervals specified in interval file.
- interval_file
Path to a BED file containing intervals to WPS was calculated over.
- genome_file
A .chrom.sizes file containing chromosome sizes.
Named Arguments#
- -o, --output-file
A bigWig file containing the adjusted WPS results over the intervals specified in interval file.
Default: “-”
- -i, --interval_size
Size in bp of each interval in the interval file.
Default: 5000
- -m, --median-window-size
Size of the median filter window used to adjust WPS scores.
Default: 1000
- -s, --savgol-window-size
Size of the Savitsky-Golay filter window used to adjust WPS scores.
Default: 21
- -p, --savgol-poly-deg
Degree polynomial for Savitsky-Golay filter.
Default: 2
- -w, --workers
Number of worker processes.
Default: 1
- --mean
A mean filter is used instead of median.
Default: False
- --subtract-edges
Take the median of the first and last 500 bases in a window and subtract from the whole interval.
Default: False
- -v, --verbose
Enable verbose mode to display detailed processing information.
delfi#
Calculates DELFI features over genome, returning information about (GC-corrected) short fragments, long fragments, DELFI ratio, and total fragments.
finaletoolkit delfi [-h] [-b BLACKLIST_FILE] [-g GAP_FILE] [-o OUTPUT_FILE] [-G] [-R] [-M] [-q QUALITY_THRESHOLD] [-w WORKERS] [-v]
input_file autosomes reference_file bins_file
Positional Arguments#
- input_file
Path to a BAM/SAM/CRAM/Fragment file containing fragment data.
- autosomes
Tab-delimited file containing (1) autosome name and (2) integer length of chromosome in base pairs.
- reference_file
The .2bit file for the associate reference genome sequence used during alignment.
- bins_file
A BED file containing bins over which to calculate DELFI. To replicate Cristiano et al.’s methodology, use 100kb bins over human autosomes.
Named Arguments#
- -b, --blacklist-file
BED file containing regions to ignore when calculating DELFI.
- -g, --gap-file
BED4 format file containing columns for “chrom”, “start”,”stop”, and “type”. The “type” column should denote whether the entry corresponds to a “centromere”, “telomere”, or “short arm”, and entries not falling into these categories are ignored. This information corresponds to the “gap” track for hg19 in the UCSC Genome Browser.
- -o, --output-file
BED, bed.gz, TSV, or CSV file to write DELFI data to. If “-”, writes to stdout.
Default: “-”
- -G, --no-gc-correct
Skip GC correction.
Default: True
- -R, --keep-nocov
Skip removal two regions in hg19 with no coverage. Use this flag when not using hg19 human reference genome.
Default: True
- -M, --no-merge-bins
Keep 100kb bins and do not merge to 5Mb size.
Default: True
- -q, --quality-threshold
Minimum mapping quality threshold.
Default: 30
- -w, --workers
Number of worker processes.
Default: 1
- -v, --verbose
Enable verbose mode to display detailed processing information.
Default: 0
delfi-gc-correct#
Performs gc-correction on raw delfi data.
finaletoolkit delfi-gc-correct [-h] [-o OUTPUT_FILE] [--header-lines HEADER_LINES] [-v] input_file
Positional Arguments#
- input_file
BED file containing raw DELFI data. Raw DELFI data should only have columns for “contig”, “start”, “stop”, “arm”, “short”, “long”, “gc”, “num_frags”, “ratio”.
Named Arguments#
- -o, --output-file
BED to print GC-corrected DELFI fractions. If “-”, will write to stdout.
Default: “-”
- --header-lines
Number of header lines in BED.
Default: 1
- -v, --verbose
Enable verbose mode to display detailed processing information.
end-motifs#
Measures frequency of k-mer 5’ end motifs.
finaletoolkit end-motifs [-h] [-k K] [-o OUTPUT_FILE] [-q QUALITY_THRESHOLD] [-w WORKERS] [-v] input_file refseq_file
Positional Arguments#
- input_file
Path to a BAM/SAM/CRAM/Fragment file containing fragment data.
- refseq_file
The .2bit file for the associate reference genome sequence used during alignment.
Named Arguments#
- -k
Length of k-mer.
Default: 4
- -o, --output-file
TSV to print k-mer frequencies. If “-”, will write to stdout.
Default: “-”
- -q, --quality-threshold
Minimum mapping quality threshold.
Default: 20
- -w, --workers
Number of worker processes.
Default: 1
- -v, --verbose
Enable verbose mode to display detailed processing information.
Default: 0
interval-end-motifs#
Measures frequency of k-mer 5’ end motifs in each region specified in a BED file and writes data into a table.
finaletoolkit interval-end-motifs [-h] [-k K] [-lo FRACTION_LOW] [-hi FRACTION_HIGH] [-o OUTPUT_FILE] [-q QUALITY_THRESHOLD] [-w WORKERS] [-v]
input_file refseq_file intervals
Positional Arguments#
- input_file
Path to a BAM/SAM/CRAM/Fragment file containing fragment data.
- refseq_file
The .2bit file for the associate reference genome sequence used during alignment.
- intervals
Path to a BED file containing intervals to retrieve end motif frequencies over.
Named Arguments#
- -k
Length of k-mer.
Default: 4
- -lo, --fraction-low
Minimum length for a fragment to be included in end motif frequency.
Default: 10
- -hi, --fraction-high
Maximum length for a fragment to be included in end motif frequency.
Default: 600
- -o, --output-file
Path to TSV or CSV file to write end motif frequencies to.
Default: “-”
- -q, --quality-threshold
Minimum mapping quality threshold.
Default: 20
- -w, --workers
Number of worker processes.
Default: 1
- -v, --verbose
Enable verbose mode to display detailed processing information.
Default: 0
mds#
Reads k-mer frequencies from a file and calculates a motif diversity score (MDS) using normalized Shannon entropy as described by Jiang et al (2020).
finaletoolkit mds [-h] [-s SEP] [--header HEADER] [file_path]
Positional Arguments#
- file_path
Tab-delimited or similar file containing one column for all k-mers a one column for frequency. Reads from stdin by default.
Default: “-”
Named Arguments#
- -s, --sep
Separator used in tabular file.
Default: ” “
- --header
Number of header rows to ignore. Default is 0
Default: 0
interval-mds#
Reads k-mer frequencies from a file and calculates a motif diversity score (MDS) for each interval using normalized Shannon entropy as described by Jiang et al (2020).
finaletoolkit interval-mds [-h] [-s SEP] [file_path] file_out
Positional Arguments#
- file_path
Tab-delimited or similar file containing one column for all k-mers a one column for frequency. Reads from stdin by default.
Default: “-”
- file_out
Path to the output BED/BEDGraph file containing MDS for each interval.
Default: “-”
Named Arguments#
- -s, --sep
Separator used in tabular file.
Default: ” “
filter-bam#
Filters a BAM file so that all reads are in mapped pairs, exceed a certain MAPQ, are not flagged for quality, are read1, are not secondary or supplementary alignments, and are on the same reference sequence as the mate.
finaletoolkit filter-bam [-h] [-r REGION_FILE] [-o OUTPUT_FILE] [-q QUALITY_THRESHOLD] [-hi FRACTION_HIGH] [-lo FRACTION_LOW] [-w WORKERS] [-v]
input_file
Positional Arguments#
- input_file
Path to BAM file.
Named Arguments#
- -r, --region-file
Only output alignments overlapping the intervals in this BED file will be included.
- -o, --output-file
Output BAM file path.
Default: “-”
- -q, --quality_threshold
Minimum mapping quality threshold.
Default: 30
- -hi, --fraction-high
Maximum length for a fragment to be included in output BAM.
- -lo, --fraction-low
Minimum length for a fragment to be included in output BAM.
- -w, --workers
Number of worker processes.
Default: 1
- -v, --verbose
Enable verbose mode to display detailed processing information.
agg-bw#
Aggregates a bigWig signal over constant-length intervals defined in a BED file.
finaletoolkit agg-bw [-h] [-o OUTPUT_FILE] [-m MEDIAN_WINDOW_SIZE] [-v] input_file interval_file
Positional Arguments#
- input_file
A bigWig file containing signals over the intervals specified in interval file.
- interval_file
Path to a BED file containing intervals over which signals were calculated over.
Named Arguments#
- -o, --output-file
A wiggle file containing the aggregate signal over the intervals specified in interval file.
Default: “-”
- -m, --median-window-size
Size of the median filter window used to adjust WPS scores. Only modify if aggregating WPS signals.
Default: 0
- -v, --verbose
Enable verbose mode to display detailed processing information.
gap-bed#
Creates a BED4 file containing centromeres, telomeres, and short-arm intervals, similar to the gaps annotation track for hg19 found on the UCSC Genome Browser (Kent et al 2002). Currently only supports hg19, b37, human_g1k_v37, hg38, and GRCh38
finaletoolkit gap-bed [-h] {hg19,b37,human_g1k_v37,hg38,GRCh38} output_file
Positional Arguments#
- reference_genome
Possible choices: hg19, b37, human_g1k_v37, hg38, GRCh38
Reference genome to provide gaps for.
- output_file
Path to write BED file to. If “-” used, writes to stdout.
Gap is used liberally in this command, and in the case hg38/GRCh38, may refer to regions where there no longer are gaps in the reference sequence.