FinaleToolkit is a package and standalone program to extract fragmentation features of cell-free DNA from paired-end sequencing data.

usage: finaletoolkit [-h] [-v]

Named Arguments#

-v, --version

show program’s version number and exit



Calculates fragmentation coverage over intervals defined in a BED file based on alignment data from a BAM/CRAM/Fragment file.

finaletoolkit coverage [-h] [-o OUTPUT_FILE] [-n] [-s SCALE_FACTOR] [-min MIN_LENGTH] [-max MAX_LENGTH] [-p {midpoint,any}] [-q QUALITY_THRESHOLD] [-w WORKERS] [-v]
                       input_file interval_file

Positional Arguments#


Path to a BAM/CRAM/Fragment file containing fragment data.


Path to a BED file containing intervals to calculate coverage over.

Named Arguments#

-o, --output-file

A BED file containing coverage values over the intervals specified in interval file.

Default: '-'

-n, --normalize

If flag set, multiplies by user inputed scale factor if given and normalizes output by total coverage. May lead to longer execution time for high-throughput data.

Default: False

-s, --scale-factor

Scale factor for coverage values. Default is 1.

Default: 1.0

-min, --min-length

Minimum length for a fragment to be included in coverage.

Default: 0

-max, --max-length

Maximum length for a fragment to be included in coverage.

-p, --intersect-policy

Possible choices: midpoint, any

Specifies what policy is used to include fragments in the given interval. See User Guide for more information.

Default: 'midpoint'

-q, --quality-threshold

Minimum mapping quality threshold.

Default: 30

-w, --workers

Number of worker processes.

Default: 1

-v, --verbose

Enable verbose mode to display detailed processing information.

Default: False


Retrieves fragment lengths grouped in bins given a BAM/CRAM/Fragment file.

finaletoolkit frag-length-bins [-h] [-c CONTIG] [-S START] [-E STOP] [-min MIN_LENGTH] [-max MAX_LENGTH] [-p {midpoint,any}] [--bin-size BIN_SIZE] [-o OUTPUT_FILE]
                               [--histogram-path HISTOGRAM_PATH] [-q QUALITY_THRESHOLD] [-v]

Positional Arguments#


Path to a BAM/CRAM/Fragment file containing fragment data.

Named Arguments#

-c, --contig

Specify the contig or chromosome to select fragments from. (Required if using –start or –stop.)

-S, --start

Specify the 0-based left-most coordinate of the interval to select fragments from. (Must also specify –contig.)

-E, --stop

Specify the 1-based right-most coordinate of the interval to select fragments from. (Must also specify –contig.)

-min, --min-length

Minimum length for a fragment to be included in fragment length.

Default: 0

-max, --max-length

Maximum length for a fragment to be included in fragment length.

-p, --intersect-policy

Possible choices: midpoint, any

Specifies what policy is used to include fragments in the given interval. See User Guide for more information.

Default: 'midpoint'


Specify the size of the bins to group fragment lengths into.

Default: 1

-o, --output-file

A .TSV file containing containing fragment lengths binned according to the specified bin size.

Default: '-'


Path to store histogram.

-q, --quality-threshold

Minimum mapping quality threshold.

Default: 30

-v, --verbose

Enable verbose mode to display detailed processing information.

Default: 0


Retrieves fragment length summary statistics over intervals defined in a BED file based on alignment data from a BAM/CRAM/Fragment file.

finaletoolkit frag-length-intervals [-h] [-min MIN_LENGTH] [-max MAX_LENGTH] [-p {midpoint,any}] [-o OUTPUT_FILE] [-q QUALITY_THRESHOLD] [-w WORKERS] [-v] input_file interval_file

Positional Arguments#


Path to a BAM/CRAM/Fragment file containing fragment data.


Path to a BED file containing intervals to retrieve fragment length summary statistics over.

Named Arguments#

-min, --min-length

Minimum length for a fragment to be included in fragment length.

Default: 0

-max, --max-length

Maximum length for a fragment to be included in fragment length.

-p, --intersect-policy

Possible choices: midpoint, any

Specifies what policy is used to include fragments in the given interval. See User Guide for more information.

Default: 'midpoint'

-o, --output-file

A BED file containing fragment length summary statistics (mean, median, st. dev, min, max) over the intervals specified in the interval file.

Default: '-'

-q, --quality-threshold

Minimum mapping quality threshold.

Default: 30

-w, --workers

Number of worker processes.

Default: 1

-v, --verbose

Enable verbose mode to display detailed processing information.

Default: 0


Calculates cleavage proportion over intervals defined in a BED file based on alignment data from a BAM/CRAM/Fragment file.

finaletoolkit cleavage-profile [-h] [-c CHROM_SIZES] [-o OUTPUT_FILE] [-min MIN_LENGTH] [-max MAX_LENGTH] [-lo MIN_LENGTH] [-hi MAX_LENGTH] [-q QUALITY_THRESHOLD] [-l LEFT] [-r RIGHT]
                               [-w WORKERS] [-v]
                               input_file interval_file

Positional Arguments#


Path to a BAM/CRAM/Fragment file containing fragment data.


Path to a BED file containing intervals to calculates cleavage proportion over.

Named Arguments#

-c, --chrom-sizes

A .chrom.sizes file containing chromosome names and sizes.

-o, --output-file

A bigWig file containing the cleavage proportion results over the intervals specified in interval file.

Default: '-'

-min, --min-length

Minimum length for a fragment to be included.

Default: 0

-max, --max-length

Maximum length for a fragment to be included.

-lo, --fraction_low

Minimum length for a fragment to be included in cleavage proportion calculation. Deprecated. Use –min-length instead.

-hi, --fraction-high

Maximum length for a fragment to be included in cleavage proportion calculation. Deprecated. Use –max-length instead.

-q, --quality-threshold

Minimum mapping quality threshold.

Default: 20

-l, --left

Number of base pairs to subtract from start coordinate to create interval. Useful when dealing with BED files with only CpG coordinates. Default is 0.

Default: 0

-r, --right

Number of base pairs to add to stop coordinate to create interval. Useful when dealing with BED files with only CpG coordinates. Default is 0.

Default: 0

-w, --workers

Number of worker processes.

Default: 1

-v, --verbose

Enable verbose mode to display detailed processing information.

Default: 0


Calculates Windowed Protection Score (WPS) over intervals defined in a BED file based on alignment data from a BAM/CRAM/Fragment file.

                  [-w WORKERS] [-v]
                  input_file site_bed

Positional Arguments#


Path to a BAM/CRAM/Fragment file containing fragment data.


Path to a BED file containing sites to calculate WPS over. The intervals in this BED file should be sorted, first by contig then start.

Named Arguments#

-c, --chrom-sizes

A .chrom.sizes file containing chromosome names and sizes.

-o, --output-file

A bigWig file containing the WPS results over the intervals specified in interval file.

Default: '-'

-i, --interval-size

Size in bp of the intervals to calculate WPS over. Thesenew intervals are centered over those specified in the site_bed.Default is 5000

Default: 5000

-W, --window-size

Size of the sliding window used to calculate WPS scores. Default is 120

Default: 120

-min, --min-length

Minimum length for a fragment to be included. Default is 120, corresponding to L-WPS.

Default: 120

-max, --max-length

Maximum length for a fragment to be included. Default is 180, corresponding to L-WPS.

Default: 180

-lo, --fraction_low

Minimum length for a fragment to be included in WPS calculation. Deprecated. Use –min-length instead.

-hi, --fraction_high

Maximum length for a fragment to be included in WPS calculation. Deprecated. Use –max-length instead.

-q, --quality-threshold

Minimum mapping quality threshold. Default is 30

Default: 30

-w, --workers

Number of worker processes.

Default: 1

-v, --verbose

Enable verbose mode to display detailed processing information.

Default: 0


Adjusts raw Windowed Protection Score (WPS) by applying a median filter and Savitsky-Golay filter.

finaletoolkit adjust-wps [-h] [-o OUTPUT_FILE] [-i INTERVAL_SIZE] [-m MEDIAN_WINDOW_SIZE] [-s SAVGOL_WINDOW_SIZE] [-p SAVGOL_POLY_DEG] [-S] [-w WORKERS] [--mean] [--subtract-edges]
                         [--edge-size EDGE_SIZE] [-v]
                         input_file interval_file chrom_sizes

Positional Arguments#


A bigWig file containing the WPS results over the intervals specified in interval file.


Path to a BED file containing intervals to WPS was calculated over.


A .chrom.sizes file containing chromosome names and sizes.

Named Arguments#

-o, --output-file

A bigWig file containing the adjusted WPS results over the intervals specified in interval file.

Default: '-'

-i, --interval_size

Size in bp of each interval in the interval file.

Default: 5000

-m, --median-window-size

Size of the median filter or mean filter window used to adjust WPS scores.

Default: 1000

-s, --savgol-window-size

Size of the Savitsky-Golay filter window used to adjust WPS scores.

Default: 21

-p, --savgol-poly-deg

Degree polynomial for Savitsky-Golay filter.

Default: 2

-S, --exclude-savgol

Do not perform Savitsky-Golay filteringscores.

Default: True

-w, --workers

Number of worker processes.

Default: 1


A mean filter is used instead of median.

Default: False


Take the median of the first and last 500 bases in a window and subtract from the whole interval.

Default: False


size of the edge subtracted from ends of window when –subtract-edges is set. Default is 500.

Default: 500

-v, --verbose

Enable verbose mode to display detailed processing information.


Calculates DELFI features over genome, returning information about (GC-corrected) short fragments, long fragments, DELFI ratio, and total fragments.

finaletoolkit delfi [-h] [-b BLACKLIST_FILE] [-g GAP_FILE] [-o OUTPUT_FILE] [-G] [-R] [-M] [-s WINDOW_SIZE] [-q QUALITY_THRESHOLD] [-w WORKERS] [-v]
                    input_file chrom_sizes reference_file bins_file

Positional Arguments#


Path to a BAM/CRAM/Fragment file containing fragment data.


Tab-delimited file containing (1) chrom name and (2) integer length of chromosome in base pairs. Should contain only autosomes ifYou want to replicate the original scripts.


The .2bit file for the associate reference genome sequence used during alignment.


A BED file containing bins over which to calculate DELFI. To replicate Cristiano et al.’s methodology, use 100kb bins over human autosomes.

Named Arguments#

-b, --blacklist-file

BED file containing regions to ignore when calculating DELFI.

-g, --gap-file

BED4 format file containing columns for “chrom”, “start”,”stop”, and “type”. The “type” column should denote whether the entry corresponds to a “centromere”, “telomere”, or “short arm”, and entries not falling into these categories are ignored. This information corresponds to the “gap” track for hg19 in the UCSC Genome Browser.

-o, --output-file

BED, bed.gz, TSV, or CSV file to write DELFI data to. If “-”, writes to stdout.

Default: '-'

-G, --no-gc-correct

Skip GC correction.

Default: False

-R, --keep-nocov

Skip removal two regions in hg19 with no coverage. Use this flag when not using hg19 human reference genome.

Default: True

-M, --no-merge-bins

Keep 100kb bins and do not merge to 5Mb size.

Default: True

-s, --window-size

Specify size of large genomic intervals to merge smaller 100kb intervals (or whatever the user specified in bins_file) into. Defaultis 5000000

Default: 5000000

-q, --quality-threshold

Minimum mapping quality threshold.

Default: 30

-w, --workers

Number of worker processes.

Default: 1

-v, --verbose

Enable verbose mode to display detailed processing information.

Default: 0


Performs gc-correction on raw delfi data. This command is deprecated and will be removed in a future version of FinaleToolkit. The delfi command has gc correction on by default.

finaletoolkit delfi-gc-correct [-h] [-o OUTPUT_FILE] [--header-lines HEADER_LINES] [-v] input_file

Positional Arguments#


BED file containing raw DELFI data. Raw DELFI data should only have columns for “contig”, “start”, “stop”, “arm”, “short”, “long”, “gc”, “num_frags”, “ratio”.

Named Arguments#

-o, --output-file

BED to print GC-corrected DELFI fractions. If “-”, will write to stdout.

Default: '-'


Number of header lines in BED.

Default: 1

-v, --verbose

Enable verbose mode to display detailed processing information.


Measures frequency of k-mer 5’ end motifs.

finaletoolkit end-motifs [-h] [-k K] [-min MIN_LENGTH] [-max MAX_LENGTH] [-B] [-n] [-o OUTPUT_FILE] [-q QUALITY_THRESHOLD] [-w WORKERS] [-v] input_file refseq_file

Positional Arguments#


Path to a BAM/CRAM/Fragment file containing fragment data.


The .2bit file for the associate reference genome sequence used during alignment.

Named Arguments#


Length of k-mer.

Default: 4

-min, --min-length

Minimum length for a fragment to be included.

Default: 0

-max, --max-length

Maximum length for a fragment to be included.

-B, --no-both-strands

Set flag to only consider one strand for end-motifs.

Default: True

-n, --negative-strand

Set flag in conjunction with -B to only consider 5’ end motifs on the negative strand.

Default: False

-o, --output-file

TSV to print k-mer frequencies. If “-”, will write to stdout.

Default: '-'

-q, --quality-threshold

Minimum mapping quality threshold.

Default: 20

-w, --workers

Number of worker processes.

Default: 1

-v, --verbose

Enable verbose mode to display detailed processing information.

Default: 0


Measures frequency of k-mer 5’ end motifs in each region specified in a BED file and writes data into a table.

finaletoolkit interval-end-motifs [-h] [-k K] [-min MIN_LENGTH] [-max MAX_LENGTH] [-lo MIN_LENGTH] [-hi MAX_LENGTH] [-B] [-n] [-o OUTPUT_FILE] [-q QUALITY_THRESHOLD] [-w WORKERS] [-v]
                                  input_file refseq_file intervals

Positional Arguments#


Path to a BAM/CRAM/Fragment file containing fragment data.


The .2bit file for the associate reference genome sequence used during alignment.


Path to a BED file containing intervals to retrieve end motif frequencies over.

Named Arguments#


Length of k-mer.

Default: 4

-min, --min-length

Minimum length for a fragment to be included.

Default: 0

-max, --max-length

Maximum length for a fragment to be included.

-lo, --fraction-low

Deprecated alias for –min-length

-hi, --fraction-high

Deprecated alias for –max-length

-B, --single-strand

Set flag to only consider one strand for end-motifs. By default, the positive strand is calculated, but with the -n flag, the 5’ end motifs of the negative strand are considered instead.

Default: True

-n, --negative-strand

Set flag in conjunction with -B to only consider 5’ end motifs on the negative strand.

Default: False

-o, --output-file

Path to TSV or CSV file to write end motif frequencies to.

Default: '-'

-q, --quality-threshold

Minimum mapping quality threshold.

Default: 20

-w, --workers

Number of worker processes.

Default: 1

-v, --verbose

Enable verbose mode to display detailed processing information.

Default: 0


Reads k-mer frequencies from a file and calculates a motif diversity score (MDS) using normalized Shannon entropy as described by Jiang et al (2020).

finaletoolkit mds [-h] [-s SEP] [--header HEADER] [file_path]

Positional Arguments#


Tab-delimited or similar file containing one column for all k-mers a one column for frequency. Reads from stdin by default.

Default: '-'

Named Arguments#

-s, --sep

Separator used in tabular file.

Default: '    '


Number of header rows to ignore. Default is 0

Default: 0


Reads k-mer frequencies from a file and calculates a motif diversity score (MDS) for each interval using normalized Shannon entropy as described by Jiang et al (2020).

finaletoolkit interval-mds [-h] [-s SEP] [--header HEADER] [file_path] file_out

Positional Arguments#


Tab-delimited or similar file containing one column for all k-mers a one column for frequency. Reads from stdin by default.

Default: '-'


Path to the output BED/BEDGraph file containing MDS for each interval.

Default: '-'

Named Arguments#

-s, --sep

Separator used in tabular file.

Default: '    '


Number of header rows to ignore. Default is 0

Default: 0


Filters a BED/BAM/CRAM file so that all reads/intervals, when applicable,are in mapped pairs, exceed a certain MAPQ, are not flagged for quality, are read1, are not secondary or supplementary alignments, are within/excluding specified intervals, and are on the same reference sequence as the mate.

finaletoolkit filter-file [-h] [-W WHITELIST_FILE] [-B BLACKLIST_FILE] [-o OUTPUT_FILE] [-q QUALITY_THRESHOLD] [-min MIN_LENGTH] [-max MAX_LENGTH] [-p {midpoint,any}] [-lo MIN_LENGTH]
                          [-hi MAX_LENGTH] [-w WORKERS] [-v]

Positional Arguments#


Path to BAM file.

Named Arguments#

-W, --whitelist-file

Only output alignments overlapping the intervals in this BED file will be included.

-B, --blacklist-file

Only output alignments outside of the intervals in this BED file will be included.

-o, --output-file

Output BED/BAM/CRAM file path.

Default: '-'

-q, --quality-threshold

Minimum mapping quality threshold.

Default: 30

-min, --min-length

Minimum length for a fragment to be included.

-max, --max-length

Maximum length for a fragment to be included.

-p, --intersect-policy

Possible choices: midpoint, any

Specifies what policy is used to include/exclude fragments in the given interval. See User Guide for more information.

Default: 'midpoint'

-lo, --fraction-low

Deprecated alias for –min-length

-hi, --fraction-high

Deprecated alias for –max-length

-w, --workers

Number of worker processes.

Default: 1

-v, --verbose

Enable verbose mode to display detailed processing information.


Aggregates a bigWig signal over constant-length intervals defined in a BED file.

finaletoolkit agg-bw [-h] [-o OUTPUT_FILE] [-m MEDIAN_WINDOW_SIZE] [-a] [-v] input_file interval_file

Positional Arguments#


A bigWig file containing signals over the intervals specified in interval file.


Path to a BED file containing intervals over which signals were calculated over.

Named Arguments#

-o, --output-file

A wiggle file containing the aggregate signal over the intervals specified in interval file.

Default: '-'

-m, --median-window-size

Size of the median filter window used to aggregate scores. Set to 120 if aggregating WPS signals.

Default: 1

-a, --mean

use mean instead

Default: False

-v, --verbose

Enable verbose mode to display detailed processing information.


Creates a BED4 file containing centromeres, telomeres, and short-arm intervals, similar to the gaps annotation track for hg19 found on the UCSC Genome Browser (Kent et al 2002). Currently only supports hg19, b37, human_g1k_v37, hg38, and GRCh38

finaletoolkit gap-bed [-h] {hg19,b37,human_g1k_v37,hg38,GRCh38} output_file

Positional Arguments#


Possible choices: hg19, b37, human_g1k_v37, hg38, GRCh38

Reference genome to provide gaps for.


Path to write BED file to. If “-” used, writes to stdout.

Gap is used liberally in this command, and in the case hg38/GRCh38, may refer to regions where there no longer are gaps in the reference sequence.