Calculates fragmentation features given a CRAM, BAM, SAM, or Frag.gz file.

usage: finaletoolkit [-h] {coverage,frag-length,frag-length-bins,frag-length-intervals,wps,delfi,filter-bam,adjust-wps,agg-bw,delfi-gc-correct,end-motifs,interval-end-motifs,mds,interval-mds,gap-bed,cleavage-profile} ...



Calculates fragmentation coverage over intervals in a BED file given a SAM, BAM, CRAM, or Frag.gz file

finaletoolkit coverage [-h] [-o OUTPUT_FILE] [-s SCALE_FACTOR] [-q QUALITY_THRESHOLD] [-w WORKERS] [-v] input_file interval_file

Positional Arguments#


SAM, BAM, CRAM, or Frag.gz file containing fragment data


BED file containing intervals over which coverage is calculated

Named Arguments#

-o, --output_file

BED file where coverage is printed

Default: “-”

-s, --scale-factor

Amount coverage will be multiplied by

Default: 1000000.0

-q, --quality_threshold

Default: 30

-w, --workers

Number of worker processes to use. Default is 1.

Default: 1

-v, --verbose

Default: 0


Calculates fragment lengths given a CRAM/BAM/SAM file

finaletoolkit frag-length [-h] [-c CONTIG] [-S START] [-E STOP] [-p INTERSECT_POLICY] [-o OUTPUT_FILE] [-q QUALITY_THRESHOLD] [-v] input_file

Positional Arguments#


bam or frag.gz file containing fragment data.

Named Arguments#

-c, --contig

contig or chromosome to select fragments from. Required if using –start or –stop.

-S, --start

0-based left-most coordinate of interval to select fragmentsfrom. Must also use –contig.

-E, --stop

1-based right-most coordinate of interval to select fragmentsfrom. Must also use –contig.

-p, --intersect_policy

Specifies what policy is used to include fragments in the given interval. Default is “midpoint”. Policies include: - midpoint: the average of end coordinates of a fragment liesin the interval. - any: any part of the fragment is in the interval.

Default: “midpoint”

-o, --output_file

File to write results to. “-” may be used to write to stdout. Default is “-“.

Default: “-”

-q, --quality_threshold

Minimum MAPQ. Default is 30.

Default: 30

-v, --verbose

Verbose logging.

Default: 0


computes frag lengths of fragments and agregates in bins by length. Either writes bins and counts to tsv or prints a histogram

finaletoolkit frag-length-bins [-h] [-c CONTIG] [-S START] [-p INTERSECT_POLICY] [-E STOP] [--bin-size BIN_SIZE] [-o OUTPUT_FILE] [--contig-by-contig] [--histogram] [-q QUALITY_THRESHOLD] [-v] input_file

Positional Arguments#


BAM or SAM file containing fragment data

Named Arguments#

-c, --contig

contig or chromosome to select fragments from. Required if using –start or –stop.

-S, --start

0-based left-most coordinate of interval to select fragmentsfrom. Must also use –contig.

-p, --intersect_policy

Specifies what policy is used to include fragments in the given interval. Default is “midpoint”. Policies include: - midpoint: the average of end coordinates of a fragment liesin the interval. - any: any part of the fragment is in the interval.

Default: “midpoint”

-E, --stop

1-based right-most coordinate of interval to select fragmentsfrom. Must also use –contig.


Used to specify a custom bin size instead of automatically calculating one.

-o, --output_file

File to write results to. “-” may be used to write to stdout. Default is “-“.

Default: “-”


Placeholder, not implemented.

Default: False


Draws a histogram in the terminal.

Default: False

-q, --quality_threshold

Minimum MAPQ. Default is 30.

Default: 30

-v, --verbose

Verbose logging.

Default: 0


Calculates frag lengths statistics over user-specified genomic intervals.

finaletoolkit frag-length-intervals [-h] [-p INTERSECT_POLICY] [-o OUTPUT_FILE] [-q QUALITY_THRESHOLD] [-w WORKERS] [-v] input_file interval_file

Positional Arguments#


BAM or SAM file containing PE WGS of cfDNA


BED file containing intervals over which to produce statistics

Named Arguments#

-p, --intersect_policy

Specifies what policy is used to include fragments in the given interval. Default is “midpoint”. Policies include: - midpoint: the average of end coordinates of a fragment liesin the interval. - any: any part of the fragment is in the interval.

Default: “midpoint”

-o, --output-file

File to print results to. if “-”, will print to stdout. Defaultis “-“.

Default: “-”

-q, --quality-threshold

minimum MAPQ to filter for

Default: 30

-w, --workers

Number of subprocesses to use

Default: 1

-v, --verbose

Determines how much is written to stderr

Default: 0


Calculates Windowed Protection Score over a region around sites specified in a BED file from alignments in a CRAM/BAM/SAM/Frag.gz file

finaletoolkit wps [-h] [-o OUTPUT_FILE] [-i INTERVAL_SIZE] [-W WINDOW_SIZE] [-lo FRACTION_LOW] [-hi FRACTION_HIGH] [-q QUALITY_THRESHOLD] [-w WORKERS] [-v] input_file site_bed

Positional Arguments#


bam or sam file containing paired-end reads of cfDNA WGS


bed file containing sites over which to calculate wps

Named Arguments#

-o, --output_file

BigWig file to write results to. Default is stdout

Default: “-”

-i, --interval_size

Default: 5000

-W, --window_size

Default: 120

-lo, --fraction_low

Default: 120

-hi, --fraction_high

Default: 180

-q, --quality_threshold

Default: 30

-w, --workers

Default: 1

-v, --verbose

Default: 0


Calculates DELFI score over genome. NOTE: due to some ad hoc implementation details, currently the only accepted reference genome is hg19.

finaletoolkit delfi [-h] [-b BLACKLIST_FILE] [-g GAP_FILE] [-o OUTPUT_FILE] [-W WINDOW_SIZE] [-gc] [-m] [-q QUALITY_THRESHOLD] [-w WORKERS] [-v] input_file autosomes reference_file bins_file

Positional Arguments#


SAM, BAM, CRAM, or Frag.gz file containing fragment reads.


Tab-delimited file where column one is chromosomes and column two is the length of said chromosome.


2bit file for reference sequence used during alignment.


BED format file containing bins over which to calculate delfi. To replicate Cristiano and colleage’s methodology, use 100kb bins over human autosomes.

Named Arguments#

-b, --blacklist_file

BED file containing darkregions to ignore when calculating DELFI.

-g, --gap_file

BED4 format file with columns “chrom”,”start”,”stop”,”type”. “type” should be “centromere”, “telomere”, or “short arm”; all others are ignored. This information corresponds to “gap” track for hg19 in UCSC Genome Browser.

-o, --output_file

BED, bed.gz, tsv, or csv file to write results to. If “-”, writes tab-deliniated data to stdout. Default is “-“.

Default: “-”

-W, --window_size

Currently unused.

Default: 5000000

-gc, --gc_correct

Indicate whther or not gc correction is applied.

Default: False

-m, --merge_bins

Indicate whther or not bins are merged to 5Mb bins.

Default: False

-q, --quality_threshold

MAPQ to be filtered.

Default: 30

-w, --workers

Maximum number of subprocesses to spawn. Should be close to number of cores.

Default: 1

-v, --verbose

Default: 0


Filters a BAM file so that all reads are in mapped pairs, exceed a certain MAPQ, are not flagged for quality, are read1, are not secondary or supplementary alignments, and are on the same reference sequence as the mate.

finaletoolkit filter-bam [-h] [-r REGION_FILE] [-o OUTPUT_FILE] [-q QUALITY_THRESHOLD] [-hi FRACTION_HIGH] [-lo FRACTION_LOW] [-w WORKERS] [-v] input_file

Positional Arguments#


BAM file with PE WGS

Named Arguments#

-r, --region-file

BED file containing regions to read fragments from. Default is None.

-o, --output-file

Path to write filtered BAM. Defualt is “-”. If set to “-”, the BAM file will be written to stdout.

Default: “-”

-q, --quality_threshold

Minimum mapping quality to filter for. Defualt is 30.

Default: 30

-hi, --fraction-high

Maximum fragment size. Default is None

-lo, --fraction-low

Minimum fragment size. Default is None

-w, --workers

Number of worker processes to spawn.

Default: 1

-v, --verbose

Specify verbosity. Number of printed statements is proportional to number of vs.


Reads WPS data from a WIG file and applies a median filter and a Savitsky-Golay filter (Savitsky and Golay, 1964).

finaletoolkit adjust-wps [-h] [-o OUTPUT_FILE] [-m MEDIAN_WINDOW_SIZE] [-s SAVGOL_WINDOW_SIZE] [-p SAVGOL_POLY_DEG] [-w WORKERS] [--mean] [--subtract-edges] [-v] input_file interval_file genome_file

Positional Arguments#


BigWig file with WPS data.


BED file containing intervals over which wps was calculated


GENOME file containing chromosome/contig names and lengths. Needed to write head for BigWig.

Named Arguments#

-o, --output-file

WIG file to print filtered WPS data. If “-”, will write to stdout. Default is “-“.

Default: “-”

-m, --median-window-size

Size of window for median filter. Default is 1000.

Default: 1000

-s, --savgol-window-size

Size of window for Savitsky-Golay filter. Default is 21.

Default: 21

-p, --savgol-poly-deg

Degree polynomial for Savitsky-Golay filter. Default is 2.

Default: 2

-w, --workers

Number of subprocesses to use. Default is 1.

Default: 1


Default: False


Default: False

-v, --verbose

Specify verbosity. Number of printed statements is proportional to number of vs.


Reads data from a BigWig file and aggregates over intervals in a BED file.

finaletoolkit agg-bw [-h] [-o OUTPUT_FILE] [-m MEDIAN_WINDOW_SIZE] [-v] input_file interval_file

Positional Arguments#


BigWig file with data.


BED file containing intervals over which wps was calculated

Named Arguments#

-o, --output-file

WIG file to print filtered WPS data. If “-”, will write to stdout. Default is “-“.

Default: “-”

-m, --median-window-size

Size of window for median filter. Default is 1000.

Default: 1000

-v, --verbose

Specify verbosity. Number of printed statements is proportional to number of vs.


Performs gc-correction on raw delfi data.

finaletoolkit delfi-gc-correct [-h] [-o OUTPUT_FILE] [--header-lines HEADER_LINES] [-v] input_file

Positional Arguments#


BED3+3 file containing raw data

Named Arguments#

-o, --output-file

BED3+3 to print GC-corrected DELFI fractions. If “-”, will write to stdout. Default is “-“.

Default: “-”


Number of header lines in BED. Default is 1.

Default: 1

-v, --verbose

Specify verbosity. Number of printed statements is proportional to number of vs.


Measures frequency of k-mer 5’ end motifs and tabulates data into a tab-delimited file.

finaletoolkit end-motifs [-h] [-k K] [-o OUTPUT_FILE] [-q QUALITY_THRESHOLD] [-w WORKERS] [-v] input_file refseq_file

Positional Arguments#


SAM, BAM, or tabix-indexed file with fragment data.


2bit file containing reference sequence that fragments were aligned to.

Named Arguments#


Length of k-mer. Default is 4.

Default: 4

-o, --output-file

TSV to print k-mer frequencies. If “-”, will write to stdout. Default is “-“.

Default: “-”

-q, --quality-threshold

Minimum MAPQ of reads. Default is 20.

Default: 20

-w, --workers

Number of subprocesses to use. Default is 1.

Default: 1

-v, --verbose

Specify verbosity. Number of printed statements is proportional to number of vs.

Default: 0


Measures frequency of k-mer 5’ end motifs in each region specified in a BED file and writes data into a table.

finaletoolkit interval-end-motifs [-h] [-k K] [-lo FRACTION_LOW] [-hi FRACTION_HIGH] [-o OUTPUT_FILE] [-q QUALITY_THRESHOLD] [-w WORKERS] [-v] input_file refseq_file intervals

Positional Arguments#


SAM, BAM, or tabix-indexed file with fragment data.


2bit file containing reference sequence that fragments were aligned to.


BED file containing intervals or list of tuples

Named Arguments#


Length of k-mer. Default is 4.

Default: 4

-lo, --fraction-low

Smallest fragment length to consider. Default is 10

Default: 10

-hi, --fraction-high

Longest fragment length to consider. Default is 600

Default: 600

-o, --output-file

File path to write results to. Either tsv or csv.

Default: “-”

-q, --quality-threshold

Minimum MAPQ of reads. Default is 20.

Default: 20

-w, --workers

Number of subprocesses to use. Default is 1.

Default: 1

-v, --verbose

Specify verbosity. Number of printed statements is proportional to number of vs.

Default: 0


Reads k-mer frequencies from a file and calculates a motif diversity score (MDS) using normalized Shannon entropy as described by Jiang et al (2020). This function is generalized for any k-mer instead of just 4-mers.

finaletoolkit mds [-h] [-s SEP] [--header HEADER] [file_path]

Positional Arguments#


Tab-delimited or similar file containing one column for all k-mers a one column for frequency. Reads from stdin by default.

Default: “-”

Named Arguments#

-s, --sep

Separator used in tabular file. Default is tab.

Default: “ “


Number of header rows to ignore. Default is 0

Default: 0


Reads k-mer frequencies from a file and calculates a motif diversity score (MDS) for each interval using normalized Shannon entropy as described by Jiang et al (2020). This function is generalized for any k-mer instead of just 4-mers.

finaletoolkit interval-mds [-h] [-s SEP] [file_path] file_out

Positional Arguments#


Tab-delimited or similar file containing one column for all k-mers a one column for frequency. Reads from stdin by default.

Default: “-”


Default: “-”

Named Arguments#

-s, --sep

Separator used in tabular file. Default is tab.

Default: “ “


Creates a BED4 file containing centromeres, telomeres, and short-arm intervals, similar to the gaps annotation track for hg19 found on the UCSC Genome Browser (Kent et al 2002). Currently only supports hg19, b37, human_g1k_v37, hg38, and GRCh38

finaletoolkit gap-bed [-h] {hg19,b37,human_g1k_v37,hg38,GRCh38} output_file

Positional Arguments#


Possible choices: hg19, b37, human_g1k_v37, hg38, GRCh38

Reference genome to provide gaps for.


Path to write bed file to. If “-” used, writes to stdout.

Gap is used liberally in this command, and in the case hg38/GRCh38, may refer to regions where there no longer are gaps in the reference sequence.



finaletoolkit cleavage-profile [-h] [-o OUTPUT_FILE] [-lo FRACTION_LOW] [-hi FRACTION_HIGH] [-q QUALITY_THRESHOLD] [-w WORKERS] [-v] input_file interval_file

Positional Arguments#


BAM, CRAM, or frag.gz containing fragment coordinates.


BED file containing intervals to calculate cleavage profile over.

Named Arguments#

-o, --output_file

Path to write output file to. If “-” used, writes bed.gz to stdout. Writes in BigWig format if “.bw” or “.bigwig” used, and writes in gzip compressed bed file if “.bed.gz” or “.bedGraph.gz” suffixes used. Default is “-“.

Default: “-”

-lo, --fraction_low

Default: 120

-hi, --fraction_high

Default: 180

-q, --quality-threshold

Minimum MAPQ of reads. Default is 20.

Default: 20

-w, --workers

Number of subprocesses to use. Default is 1.

Default: 1

-v, --verbose

Specify verbosity. Number of printed statements is proportional to number of vs.

Default: 0