CLI#
FinaleToolkit is a package and standalone program to extract fragmentation features of cell-free DNA from paired-end sequencing data.
usage: finaletoolkit [-h] [-v]
{coverage,frag-length-bins,frag-length-intervals,cleavage-profile,wps,adjust-wps,delfi,delfi-gc-correct,end-motifs,interval-end-motifs,mds,interval-mds,filter-bam,agg-bw,gap-bed}
...
Named Arguments#
- -v, --version
show program’s version number and exit
Sub-commands#
coverage#
Calculates fragmentation coverage over intervals defined in a BED file based on alignment data from a BAM/CRAM/Fragment file.
finaletoolkit coverage [-h] [-o OUTPUT_FILE] [-n] [-s SCALE_FACTOR]
[-min MIN_LENGTH] [-max MAX_LENGTH]
[-p {midpoint,any}] [-q QUALITY_THRESHOLD]
[-w WORKERS] [-v]
input_file interval_file
Positional Arguments#
- input_file
Path to a BAM/CRAM/Fragment file containing fragment data.
- interval_file
Path to a BED file containing intervals to calculate coverage over.
Named Arguments#
- -o, --output-file
A BED file containing coverage values over the intervals specified in interval file.
Default: “-”
- -n, --normalize
If flag set, multiplies by user inputed scale factor if given and normalizes output by total coverage. May lead to longer execution time for high-throughput data.
Default: False
- -s, --scale-factor
Scale factor for coverage values. Default is 1.
Default: 1.0
- -min, --min-length
Minimum length for a fragment to be included in coverage.
Default: 0
- -max, --max-length
Maximum length for a fragment to be included in coverage.
- -p, --intersect-policy
Possible choices: midpoint, any
Specifies what policy is used to include fragments in the given interval. See User Guide for more information.
Default: “midpoint”
- -q, --quality-threshold
Minimum mapping quality threshold.
Default: 30
- -w, --workers
Number of worker processes.
Default: 1
- -v, --verbose
Enable verbose mode to display detailed processing information.
Default: False
frag-length-bins#
Retrieves fragment lengths grouped in bins given a BAM/CRAM/Fragment file.
finaletoolkit frag-length-bins [-h] [-c CONTIG] [-S START] [-E STOP]
[-min MIN_LENGTH] [-max MAX_LENGTH]
[-p {midpoint,any}] [--bin-size BIN_SIZE]
[-o OUTPUT_FILE]
[--histogram-path HISTOGRAM_PATH]
[-q QUALITY_THRESHOLD] [-v]
input_file
Positional Arguments#
- input_file
Path to a BAM/CRAM/Fragment file containing fragment data.
Named Arguments#
- -c, --contig
Specify the contig or chromosome to select fragments from. (Required if using –start or –stop.)
- -S, --start
Specify the 0-based left-most coordinate of the interval to select fragments from. (Must also specify –contig.)
- -E, --stop
Specify the 1-based right-most coordinate of the interval to select fragments from. (Must also specify –contig.)
- -min, --min-length
Minimum length for a fragment to be included in fragment length.
Default: 0
- -max, --max-length
Maximum length for a fragment to be included in fragment length.
- -p, --intersect-policy
Possible choices: midpoint, any
Specifies what policy is used to include fragments in the given interval. See User Guide for more information.
Default: “midpoint”
- --bin-size
Specify the size of the bins to group fragment lengths into.
Default: 1
- -o, --output-file
A .TSV file containing containing fragment lengths binned according to the specified bin size.
Default: “-”
- --histogram-path
Path to store histogram.
- -q, --quality-threshold
Minimum mapping quality threshold.
Default: 30
- -v, --verbose
Enable verbose mode to display detailed processing information.
Default: 0
frag-length-intervals#
Retrieves fragment length summary statistics over intervals defined in a BED file based on alignment data from a BAM/CRAM/Fragment file.
finaletoolkit frag-length-intervals [-h] [-min MIN_LENGTH]
[-max MAX_LENGTH] [-p {midpoint,any}]
[-o OUTPUT_FILE]
[-q QUALITY_THRESHOLD] [-w WORKERS]
[-v]
input_file interval_file
Positional Arguments#
- input_file
Path to a BAM/CRAM/Fragment file containing fragment data.
- interval_file
Path to a BED file containing intervals to retrieve fragment length summary statistics over.
Named Arguments#
- -min, --min-length
Minimum length for a fragment to be included in fragment length.
Default: 0
- -max, --max-length
Maximum length for a fragment to be included in fragment length.
- -p, --intersect-policy
Possible choices: midpoint, any
Specifies what policy is used to include fragments in the given interval. See User Guide for more information.
Default: “midpoint”
- -o, --output-file
A BED file containing fragment length summary statistics (mean, median, st. dev, min, max) over the intervals specified in the interval file.
Default: “-”
- -q, --quality-threshold
Minimum mapping quality threshold.
Default: 30
- -w, --workers
Number of worker processes.
Default: 1
- -v, --verbose
Enable verbose mode to display detailed processing information.
Default: 0
cleavage-profile#
Calculates cleavage proportion over intervals defined in a BED file based on alignment data from a BAM/CRAM/Fragment file.
finaletoolkit cleavage-profile [-h] [-c CHROM_SIZES] [-o OUTPUT_FILE]
[-min MIN_LENGTH] [-max MAX_LENGTH]
[-lo MIN_LENGTH] [-hi MAX_LENGTH]
[-q QUALITY_THRESHOLD] [-l LEFT] [-r RIGHT]
[-w WORKERS] [-v]
input_file interval_file
Positional Arguments#
- input_file
Path to a BAM/CRAM/Fragment file containing fragment data.
- interval_file
Path to a BED file containing intervals to calculates cleavage proportion over.
Named Arguments#
- -c, --chrom-sizes
A .chrom.sizes file containing chromosome names and sizes.
- -o, --output-file
A bigWig file containing the cleavage proportion results over the intervals specified in interval file.
Default: “-”
- -min, --min-length
Minimum length for a fragment to be included.
Default: 0
- -max, --max-length
Maximum length for a fragment to be included.
- -lo, --fraction_low
Minimum length for a fragment to be included in cleavage proportion calculation. Deprecated. Use –min-length instead.
- -hi, --fraction-high
Maximum length for a fragment to be included in cleavage proportion calculation. Deprecated. Use –max-length instead.
- -q, --quality-threshold
Minimum mapping quality threshold.
Default: 20
- -l, --left
Number of base pairs to subtract from start coordinate to create interval. Useful when dealing with BED files with only CpG coordinates. Default is 0.
Default: 0
- -r, --right
Number of base pairs to add to stop coordinate to create interval. Useful when dealing with BED files with only CpG coordinates. Default is 0.
Default: 0
- -w, --workers
Number of worker processes.
Default: 1
- -v, --verbose
Enable verbose mode to display detailed processing information.
Default: 0
wps#
Calculates Windowed Protection Score (WPS) over intervals defined in a BED file based on alignment data from a BAM/CRAM/Fragment file.
finaletoolkit wps [-h] [-c CHROM_SIZES] [-o OUTPUT_FILE]
[-i INTERVAL_SIZE] [-W WINDOW_SIZE] [-min MIN_LENGTH]
[-max MAX_LENGTH] [-lo MIN_LENGTH] [-hi MAX_LENGTH]
[-q QUALITY_THRESHOLD] [-w WORKERS] [-v]
input_file site_bed
Positional Arguments#
- input_file
Path to a BAM/CRAM/Fragment file containing fragment data.
- site_bed
Path to a BED file containing sites to calculate WPS over. The intervals in this BED file should be sorted, first by contig then start.
Named Arguments#
- -c, --chrom-sizes
A .chrom.sizes file containing chromosome names and sizes.
- -o, --output-file
A bigWig file containing the WPS results over the intervals specified in interval file.
Default: “-”
- -i, --interval-size
Size in bp of the intervals to calculate WPS over. Thesenew intervals are centered over those specified in the site_bed.Default is 5000
Default: 5000
- -W, --window-size
Size of the sliding window used to calculate WPS scores. Default is 120
Default: 120
- -min, --min-length
Minimum length for a fragment to be included. Default is 120, corresponding to L-WPS.
Default: 120
- -max, --max-length
Maximum length for a fragment to be included. Default is 180, corresponding to L-WPS.
Default: 180
- -lo, --fraction_low
Minimum length for a fragment to be included in WPS calculation. Deprecated. Use –min-length instead.
- -hi, --fraction_high
Maximum length for a fragment to be included in WPS calculation. Deprecated. Use –max-length instead.
- -q, --quality-threshold
Minimum mapping quality threshold. Default is 30
Default: 30
- -w, --workers
Number of worker processes.
Default: 1
- -v, --verbose
Enable verbose mode to display detailed processing information.
Default: 0
adjust-wps#
Adjusts raw Windowed Protection Score (WPS) by applying a median filter and Savitsky-Golay filter.
finaletoolkit adjust-wps [-h] [-o OUTPUT_FILE] [-i INTERVAL_SIZE]
[-m MEDIAN_WINDOW_SIZE] [-s SAVGOL_WINDOW_SIZE]
[-p SAVGOL_POLY_DEG] [-S] [-w WORKERS] [--mean]
[--subtract-edges] [--edge-size EDGE_SIZE] [-v]
input_file interval_file chrom_sizes
Positional Arguments#
- input_file
A bigWig file containing the WPS results over the intervals specified in interval file.
- interval_file
Path to a BED file containing intervals to WPS was calculated over.
- chrom_sizes
A .chrom.sizes file containing chromosome names and sizes.
Named Arguments#
- -o, --output-file
A bigWig file containing the adjusted WPS results over the intervals specified in interval file.
Default: “-”
- -i, --interval_size
Size in bp of each interval in the interval file.
Default: 5000
- -m, --median-window-size
Size of the median filter or mean filter window used to adjust WPS scores.
Default: 1000
- -s, --savgol-window-size
Size of the Savitsky-Golay filter window used to adjust WPS scores.
Default: 21
- -p, --savgol-poly-deg
Degree polynomial for Savitsky-Golay filter.
Default: 2
- -S, --exclude-savgol
Do not perform Savitsky-Golay filteringscores.
Default: True
- -w, --workers
Number of worker processes.
Default: 1
- --mean
A mean filter is used instead of median.
Default: False
- --subtract-edges
Take the median of the first and last 500 bases in a window and subtract from the whole interval.
Default: False
- --edge-size
size of the edge subtracted from ends of window when –subtract-edges is set. Default is 500.
Default: 500
- -v, --verbose
Enable verbose mode to display detailed processing information.
delfi#
Calculates DELFI features over genome, returning information about (GC-corrected) short fragments, long fragments, DELFI ratio, and total fragments.
finaletoolkit delfi [-h] [-b BLACKLIST_FILE] [-g GAP_FILE]
[-o OUTPUT_FILE] [-G] [-R] [-M] [-s WINDOW_SIZE]
[-q QUALITY_THRESHOLD] [-w WORKERS] [-v]
input_file chrom_sizes reference_file bins_file
Positional Arguments#
- input_file
Path to a BAM/CRAM/Fragment file containing fragment data.
- chrom_sizes
Tab-delimited file containing (1) chrom name and (2) integer length of chromosome in base pairs. Should contain only autosomes ifYou want to replicate the original scripts.
- reference_file
The .2bit file for the associate reference genome sequence used during alignment.
- bins_file
A BED file containing bins over which to calculate DELFI. To replicate Cristiano et al.’s methodology, use 100kb bins over human autosomes.
Named Arguments#
- -b, --blacklist-file
BED file containing regions to ignore when calculating DELFI.
- -g, --gap-file
BED4 format file containing columns for “chrom”, “start”,”stop”, and “type”. The “type” column should denote whether the entry corresponds to a “centromere”, “telomere”, or “short arm”, and entries not falling into these categories are ignored. This information corresponds to the “gap” track for hg19 in the UCSC Genome Browser.
- -o, --output-file
BED, bed.gz, TSV, or CSV file to write DELFI data to. If “-”, writes to stdout.
Default: “-”
- -G, --no-gc-correct
Skip GC correction.
Default: True
- -R, --keep-nocov
Skip removal two regions in hg19 with no coverage. Use this flag when not using hg19 human reference genome.
Default: True
- -M, --no-merge-bins
Keep 100kb bins and do not merge to 5Mb size.
Default: True
- -s, --window-size
Specify size of large genomic intervals to merge smaller 100kb intervals (or whatever the user specified in bins_file) into. Defaultis 5000000
Default: 5000000
- -q, --quality-threshold
Minimum mapping quality threshold.
Default: 30
- -w, --workers
Number of worker processes.
Default: 1
- -v, --verbose
Enable verbose mode to display detailed processing information.
Default: 0
delfi-gc-correct#
Performs gc-correction on raw delfi data. This command is deprecated and will be removed in a future version of FinaleToolkit. The delfi command has gc correction on by default.
finaletoolkit delfi-gc-correct [-h] [-o OUTPUT_FILE]
[--header-lines HEADER_LINES] [-v]
input_file
Positional Arguments#
- input_file
BED file containing raw DELFI data. Raw DELFI data should only have columns for “contig”, “start”, “stop”, “arm”, “short”, “long”, “gc”, “num_frags”, “ratio”.
Named Arguments#
- -o, --output-file
BED to print GC-corrected DELFI fractions. If “-”, will write to stdout.
Default: “-”
- --header-lines
Number of header lines in BED.
Default: 1
- -v, --verbose
Enable verbose mode to display detailed processing information.
end-motifs#
Measures frequency of k-mer 5’ end motifs.
finaletoolkit end-motifs [-h] [-k K] [-min MIN_LENGTH] [-max MAX_LENGTH]
[-B] [-n] [-o OUTPUT_FILE] [-q QUALITY_THRESHOLD]
[-w WORKERS] [-v]
input_file refseq_file
Positional Arguments#
- input_file
Path to a BAM/CRAM/Fragment file containing fragment data.
- refseq_file
The .2bit file for the associate reference genome sequence used during alignment.
Named Arguments#
- -k
Length of k-mer.
Default: 4
- -min, --min-length
Minimum length for a fragment to be included.
Default: 0
- -max, --max-length
Maximum length for a fragment to be included.
- -B, --no-both-strands
Set flag to only consider one strand for end-motifs.
Default: True
- -n, --negative-strand
Set flag in conjunction with -B to only consider 5’ end motifs on the negative strand.
Default: False
- -o, --output-file
TSV to print k-mer frequencies. If “-”, will write to stdout.
Default: “-”
- -q, --quality-threshold
Minimum mapping quality threshold.
Default: 20
- -w, --workers
Number of worker processes.
Default: 1
- -v, --verbose
Enable verbose mode to display detailed processing information.
Default: 0
interval-end-motifs#
Measures frequency of k-mer 5’ end motifs in each region specified in a BED file and writes data into a table.
finaletoolkit interval-end-motifs [-h] [-k K] [-min MIN_LENGTH]
[-max MAX_LENGTH] [-lo MIN_LENGTH]
[-hi MAX_LENGTH] [-B] [-n]
[-o OUTPUT_FILE] [-q QUALITY_THRESHOLD]
[-w WORKERS] [-v]
input_file refseq_file intervals
Positional Arguments#
- input_file
Path to a BAM/CRAM/Fragment file containing fragment data.
- refseq_file
The .2bit file for the associate reference genome sequence used during alignment.
- intervals
Path to a BED file containing intervals to retrieve end motif frequencies over.
Named Arguments#
- -k
Length of k-mer.
Default: 4
- -min, --min-length
Minimum length for a fragment to be included.
Default: 0
- -max, --max-length
Maximum length for a fragment to be included.
- -lo, --fraction-low
Deprecated alias for –min-length
- -hi, --fraction-high
Deprecated alias for –max-length
- -B, --single-strand
Set flag to only consider one strand for end-motifs. By default, the positive strand is calculated, but with the -n flag, the 5’ end motifs of the negative strand are considered instead.
Default: True
- -n, --negative-strand
Set flag in conjunction with -B to only consider 5’ end motifs on the negative strand.
Default: False
- -o, --output-file
Path to TSV or CSV file to write end motif frequencies to.
Default: “-”
- -q, --quality-threshold
Minimum mapping quality threshold.
Default: 20
- -w, --workers
Number of worker processes.
Default: 1
- -v, --verbose
Enable verbose mode to display detailed processing information.
Default: 0
mds#
Reads k-mer frequencies from a file and calculates a motif diversity score (MDS) using normalized Shannon entropy as described by Jiang et al (2020).
finaletoolkit mds [-h] [-s SEP] [--header HEADER] [file_path]
Positional Arguments#
- file_path
Tab-delimited or similar file containing one column for all k-mers a one column for frequency. Reads from stdin by default.
Default: “-”
Named Arguments#
- -s, --sep
Separator used in tabular file.
Default: “ “
- --header
Number of header rows to ignore. Default is 0
Default: 0
interval-mds#
Reads k-mer frequencies from a file and calculates a motif diversity score (MDS) for each interval using normalized Shannon entropy as described by Jiang et al (2020).
finaletoolkit interval-mds [-h] [-s SEP] [--header HEADER]
[file_path] file_out
Positional Arguments#
- file_path
Tab-delimited or similar file containing one column for all k-mers a one column for frequency. Reads from stdin by default.
Default: “-”
- file_out
Path to the output BED/BEDGraph file containing MDS for each interval.
Default: “-”
Named Arguments#
- -s, --sep
Separator used in tabular file.
Default: “ “
- --header
Number of header rows to ignore. Default is 0
Default: 0
filter-bam#
Filters a BAM file so that all reads are in mapped pairs, exceed a certain MAPQ, are not flagged for quality, are read1, are not secondary or supplementary alignments, and are on the same reference sequence as the mate.
finaletoolkit filter-bam [-h] [-r REGION_FILE] [-o OUTPUT_FILE]
[-q QUALITY_THRESHOLD] [-min MIN_LENGTH]
[-max MAX_LENGTH] [-lo MIN_LENGTH]
[-hi MAX_LENGTH] [-w WORKERS] [-v]
input_file
Positional Arguments#
- input_file
Path to BAM file.
Named Arguments#
- -r, --region-file
Only output alignments overlapping the intervals in this BED file will be included.
- -o, --output-file
Output BAM file path.
Default: “-”
- -q, --quality-threshold
Minimum mapping quality threshold.
Default: 30
- -min, --min-length
Minimum length for a fragment to be included.
- -max, --max-length
Maximum length for a fragment to be included.
- -lo, --fraction-low
Deprecated alias for –min-length
- -hi, --fraction-high
Deprecated alias for –max-length
- -w, --workers
Number of worker processes.
Default: 1
- -v, --verbose
Enable verbose mode to display detailed processing information.
agg-bw#
Aggregates a bigWig signal over constant-length intervals defined in a BED file.
finaletoolkit agg-bw [-h] [-o OUTPUT_FILE] [-m MEDIAN_WINDOW_SIZE] [-a]
[-v]
input_file interval_file
Positional Arguments#
- input_file
A bigWig file containing signals over the intervals specified in interval file.
- interval_file
Path to a BED file containing intervals over which signals were calculated over.
Named Arguments#
- -o, --output-file
A wiggle file containing the aggregate signal over the intervals specified in interval file.
Default: “-”
- -m, --median-window-size
Size of the median filter window used to aggregate scores. Set to 120 if aggregating WPS signals.
Default: 1
- -a, --mean
use mean instead
Default: False
- -v, --verbose
Enable verbose mode to display detailed processing information.
gap-bed#
Creates a BED4 file containing centromeres, telomeres, and short-arm intervals, similar to the gaps annotation track for hg19 found on the UCSC Genome Browser (Kent et al 2002). Currently only supports hg19, b37, human_g1k_v37, hg38, and GRCh38
finaletoolkit gap-bed [-h]
{hg19,b37,human_g1k_v37,hg38,GRCh38} output_file
Positional Arguments#
- reference_genome
Possible choices: hg19, b37, human_g1k_v37, hg38, GRCh38
Reference genome to provide gaps for.
- output_file
Path to write BED file to. If “-” used, writes to stdout.
Gap is used liberally in this command, and in the case hg38/GRCh38, may refer to regions where there no longer are gaps in the reference sequence.