CLI#

FinaleToolkit is a package and standalone program to extract fragmentation features of cell-free DNA from paired-end sequencing data.

usage: finaletoolkit [-h] [-v]
                     {coverage,frag-length-bins,frag-length-intervals,cleavage-profile,wps,adjust-wps,delfi,delfi-gc-correct,end-motifs,interval-end-motifs,mds,interval-mds,filter-bam,agg-bw,gap-bed}
                     ...

Named Arguments#

-v, --version

show program’s version number and exit

Sub-commands#

coverage#

Calculates fragmentation coverage over intervals defined in a BED file based on alignment data from a BAM/CRAM/Fragment file.

finaletoolkit coverage [-h] [-o OUTPUT_FILE] [-n] [-s SCALE_FACTOR]
                       [-min MIN_LENGTH] [-max MAX_LENGTH]
                       [-p {midpoint,any}] [-q QUALITY_THRESHOLD]
                       [-w WORKERS] [-v]
                       input_file interval_file

Positional Arguments#

input_file

Path to a BAM/CRAM/Fragment file containing fragment data.

interval_file

Path to a BED file containing intervals to calculate coverage over.

Named Arguments#

-o, --output-file

A BED file containing coverage values over the intervals specified in interval file.

Default: “-”

-n, --normalize

If flag set, multiplies by user inputed scale factor if given and normalizes output by total coverage. May lead to longer execution time for high-throughput data.

Default: False

-s, --scale-factor

Scale factor for coverage values. Default is 1.

Default: 1.0

-min, --min-length

Minimum length for a fragment to be included in coverage.

Default: 0

-max, --max-length

Maximum length for a fragment to be included in coverage.

-p, --intersect-policy

Possible choices: midpoint, any

Specifies what policy is used to include fragments in the given interval. See User Guide for more information.

Default: “midpoint”

-q, --quality-threshold

Minimum mapping quality threshold.

Default: 30

-w, --workers

Number of worker processes.

Default: 1

-v, --verbose

Enable verbose mode to display detailed processing information.

Default: False

frag-length-bins#

Retrieves fragment lengths grouped in bins given a BAM/CRAM/Fragment file.

finaletoolkit frag-length-bins [-h] [-c CONTIG] [-S START] [-E STOP]
                               [-min MIN_LENGTH] [-max MAX_LENGTH]
                               [-p {midpoint,any}] [--bin-size BIN_SIZE]
                               [-o OUTPUT_FILE]
                               [--histogram-path HISTOGRAM_PATH]
                               [-q QUALITY_THRESHOLD] [-v]
                               input_file

Positional Arguments#

input_file

Path to a BAM/CRAM/Fragment file containing fragment data.

Named Arguments#

-c, --contig

Specify the contig or chromosome to select fragments from. (Required if using –start or –stop.)

-S, --start

Specify the 0-based left-most coordinate of the interval to select fragments from. (Must also specify –contig.)

-E, --stop

Specify the 1-based right-most coordinate of the interval to select fragments from. (Must also specify –contig.)

-min, --min-length

Minimum length for a fragment to be included in fragment length.

Default: 0

-max, --max-length

Maximum length for a fragment to be included in fragment length.

-p, --intersect-policy

Possible choices: midpoint, any

Specifies what policy is used to include fragments in the given interval. See User Guide for more information.

Default: “midpoint”

--bin-size

Specify the size of the bins to group fragment lengths into.

Default: 1

-o, --output-file

A .TSV file containing containing fragment lengths binned according to the specified bin size.

Default: “-”

--histogram-path

Path to store histogram.

-q, --quality-threshold

Minimum mapping quality threshold.

Default: 30

-v, --verbose

Enable verbose mode to display detailed processing information.

Default: 0

frag-length-intervals#

Retrieves fragment length summary statistics over intervals defined in a BED file based on alignment data from a BAM/CRAM/Fragment file.

finaletoolkit frag-length-intervals [-h] [-min MIN_LENGTH]
                                    [-max MAX_LENGTH] [-p {midpoint,any}]
                                    [-o OUTPUT_FILE]
                                    [-q QUALITY_THRESHOLD] [-w WORKERS]
                                    [-v]
                                    input_file interval_file

Positional Arguments#

input_file

Path to a BAM/CRAM/Fragment file containing fragment data.

interval_file

Path to a BED file containing intervals to retrieve fragment length summary statistics over.

Named Arguments#

-min, --min-length

Minimum length for a fragment to be included in fragment length.

Default: 0

-max, --max-length

Maximum length for a fragment to be included in fragment length.

-p, --intersect-policy

Possible choices: midpoint, any

Specifies what policy is used to include fragments in the given interval. See User Guide for more information.

Default: “midpoint”

-o, --output-file

A BED file containing fragment length summary statistics (mean, median, st. dev, min, max) over the intervals specified in the interval file.

Default: “-”

-q, --quality-threshold

Minimum mapping quality threshold.

Default: 30

-w, --workers

Number of worker processes.

Default: 1

-v, --verbose

Enable verbose mode to display detailed processing information.

Default: 0

cleavage-profile#

Calculates cleavage proportion over intervals defined in a BED file based on alignment data from a BAM/CRAM/Fragment file.

finaletoolkit cleavage-profile [-h] [-c CHROM_SIZES] [-o OUTPUT_FILE]
                               [-min MIN_LENGTH] [-max MAX_LENGTH]
                               [-lo MIN_LENGTH] [-hi MAX_LENGTH]
                               [-q QUALITY_THRESHOLD] [-l LEFT] [-r RIGHT]
                               [-w WORKERS] [-v]
                               input_file interval_file

Positional Arguments#

input_file

Path to a BAM/CRAM/Fragment file containing fragment data.

interval_file

Path to a BED file containing intervals to calculates cleavage proportion over.

Named Arguments#

-c, --chrom-sizes

A .chrom.sizes file containing chromosome names and sizes.

-o, --output-file

A bigWig file containing the cleavage proportion results over the intervals specified in interval file.

Default: “-”

-min, --min-length

Minimum length for a fragment to be included.

Default: 0

-max, --max-length

Maximum length for a fragment to be included.

-lo, --fraction_low

Minimum length for a fragment to be included in cleavage proportion calculation. Deprecated. Use –min-length instead.

-hi, --fraction-high

Maximum length for a fragment to be included in cleavage proportion calculation. Deprecated. Use –max-length instead.

-q, --quality-threshold

Minimum mapping quality threshold.

Default: 20

-l, --left

Number of base pairs to subtract from start coordinate to create interval. Useful when dealing with BED files with only CpG coordinates. Default is 0.

Default: 0

-r, --right

Number of base pairs to add to stop coordinate to create interval. Useful when dealing with BED files with only CpG coordinates. Default is 0.

Default: 0

-w, --workers

Number of worker processes.

Default: 1

-v, --verbose

Enable verbose mode to display detailed processing information.

Default: 0

wps#

Calculates Windowed Protection Score (WPS) over intervals defined in a BED file based on alignment data from a BAM/CRAM/Fragment file.

finaletoolkit wps [-h] [-c CHROM_SIZES] [-o OUTPUT_FILE]
                  [-i INTERVAL_SIZE] [-W WINDOW_SIZE] [-min MIN_LENGTH]
                  [-max MAX_LENGTH] [-lo MIN_LENGTH] [-hi MAX_LENGTH]
                  [-q QUALITY_THRESHOLD] [-w WORKERS] [-v]
                  input_file site_bed

Positional Arguments#

input_file

Path to a BAM/CRAM/Fragment file containing fragment data.

site_bed

Path to a BED file containing sites to calculate WPS over. The intervals in this BED file should be sorted, first by contig then start.

Named Arguments#

-c, --chrom-sizes

A .chrom.sizes file containing chromosome names and sizes.

-o, --output-file

A bigWig file containing the WPS results over the intervals specified in interval file.

Default: “-”

-i, --interval-size

Size in bp of the intervals to calculate WPS over. Thesenew intervals are centered over those specified in the site_bed.Default is 5000

Default: 5000

-W, --window-size

Size of the sliding window used to calculate WPS scores. Default is 120

Default: 120

-min, --min-length

Minimum length for a fragment to be included. Default is 120, corresponding to L-WPS.

Default: 120

-max, --max-length

Maximum length for a fragment to be included. Default is 180, corresponding to L-WPS.

Default: 180

-lo, --fraction_low

Minimum length for a fragment to be included in WPS calculation. Deprecated. Use –min-length instead.

-hi, --fraction_high

Maximum length for a fragment to be included in WPS calculation. Deprecated. Use –max-length instead.

-q, --quality-threshold

Minimum mapping quality threshold. Default is 30

Default: 30

-w, --workers

Number of worker processes.

Default: 1

-v, --verbose

Enable verbose mode to display detailed processing information.

Default: 0

adjust-wps#

Adjusts raw Windowed Protection Score (WPS) by applying a median filter and Savitsky-Golay filter.

finaletoolkit adjust-wps [-h] [-o OUTPUT_FILE] [-i INTERVAL_SIZE]
                         [-m MEDIAN_WINDOW_SIZE] [-s SAVGOL_WINDOW_SIZE]
                         [-p SAVGOL_POLY_DEG] [-S] [-w WORKERS] [--mean]
                         [--subtract-edges] [--edge-size EDGE_SIZE] [-v]
                         input_file interval_file chrom_sizes

Positional Arguments#

input_file

A bigWig file containing the WPS results over the intervals specified in interval file.

interval_file

Path to a BED file containing intervals to WPS was calculated over.

chrom_sizes

A .chrom.sizes file containing chromosome names and sizes.

Named Arguments#

-o, --output-file

A bigWig file containing the adjusted WPS results over the intervals specified in interval file.

Default: “-”

-i, --interval_size

Size in bp of each interval in the interval file.

Default: 5000

-m, --median-window-size

Size of the median filter or mean filter window used to adjust WPS scores.

Default: 1000

-s, --savgol-window-size

Size of the Savitsky-Golay filter window used to adjust WPS scores.

Default: 21

-p, --savgol-poly-deg

Degree polynomial for Savitsky-Golay filter.

Default: 2

-S, --exclude-savgol

Do not perform Savitsky-Golay filteringscores.

Default: True

-w, --workers

Number of worker processes.

Default: 1

--mean

A mean filter is used instead of median.

Default: False

--subtract-edges

Take the median of the first and last 500 bases in a window and subtract from the whole interval.

Default: False

--edge-size

size of the edge subtracted from ends of window when –subtract-edges is set. Default is 500.

Default: 500

-v, --verbose

Enable verbose mode to display detailed processing information.

delfi#

Calculates DELFI features over genome, returning information about (GC-corrected) short fragments, long fragments, DELFI ratio, and total fragments.

finaletoolkit delfi [-h] [-b BLACKLIST_FILE] [-g GAP_FILE]
                    [-o OUTPUT_FILE] [-G] [-R] [-M] [-s WINDOW_SIZE]
                    [-q QUALITY_THRESHOLD] [-w WORKERS] [-v]
                    input_file chrom_sizes reference_file bins_file

Positional Arguments#

input_file

Path to a BAM/CRAM/Fragment file containing fragment data.

chrom_sizes

Tab-delimited file containing (1) chrom name and (2) integer length of chromosome in base pairs. Should contain only autosomes ifYou want to replicate the original scripts.

reference_file

The .2bit file for the associate reference genome sequence used during alignment.

bins_file

A BED file containing bins over which to calculate DELFI. To replicate Cristiano et al.’s methodology, use 100kb bins over human autosomes.

Named Arguments#

-b, --blacklist-file

BED file containing regions to ignore when calculating DELFI.

-g, --gap-file

BED4 format file containing columns for “chrom”, “start”,”stop”, and “type”. The “type” column should denote whether the entry corresponds to a “centromere”, “telomere”, or “short arm”, and entries not falling into these categories are ignored. This information corresponds to the “gap” track for hg19 in the UCSC Genome Browser.

-o, --output-file

BED, bed.gz, TSV, or CSV file to write DELFI data to. If “-”, writes to stdout.

Default: “-”

-G, --no-gc-correct

Skip GC correction.

Default: True

-R, --keep-nocov

Skip removal two regions in hg19 with no coverage. Use this flag when not using hg19 human reference genome.

Default: True

-M, --no-merge-bins

Keep 100kb bins and do not merge to 5Mb size.

Default: True

-s, --window-size

Specify size of large genomic intervals to merge smaller 100kb intervals (or whatever the user specified in bins_file) into. Defaultis 5000000

Default: 5000000

-q, --quality-threshold

Minimum mapping quality threshold.

Default: 30

-w, --workers

Number of worker processes.

Default: 1

-v, --verbose

Enable verbose mode to display detailed processing information.

Default: 0

delfi-gc-correct#

Performs gc-correction on raw delfi data. This command is deprecated and will be removed in a future version of FinaleToolkit. The delfi command has gc correction on by default.

finaletoolkit delfi-gc-correct [-h] [-o OUTPUT_FILE]
                               [--header-lines HEADER_LINES] [-v]
                               input_file

Positional Arguments#

input_file

BED file containing raw DELFI data. Raw DELFI data should only have columns for “contig”, “start”, “stop”, “arm”, “short”, “long”, “gc”, “num_frags”, “ratio”.

Named Arguments#

-o, --output-file

BED to print GC-corrected DELFI fractions. If “-”, will write to stdout.

Default: “-”

--header-lines

Number of header lines in BED.

Default: 1

-v, --verbose

Enable verbose mode to display detailed processing information.

end-motifs#

Measures frequency of k-mer 5’ end motifs.

finaletoolkit end-motifs [-h] [-k K] [-min MIN_LENGTH] [-max MAX_LENGTH]
                         [-B] [-n] [-o OUTPUT_FILE] [-q QUALITY_THRESHOLD]
                         [-w WORKERS] [-v]
                         input_file refseq_file

Positional Arguments#

input_file

Path to a BAM/CRAM/Fragment file containing fragment data.

refseq_file

The .2bit file for the associate reference genome sequence used during alignment.

Named Arguments#

-k

Length of k-mer.

Default: 4

-min, --min-length

Minimum length for a fragment to be included.

Default: 0

-max, --max-length

Maximum length for a fragment to be included.

-B, --no-both-strands

Set flag to only consider one strand for end-motifs.

Default: True

-n, --negative-strand

Set flag in conjunction with -B to only consider 5’ end motifs on the negative strand.

Default: False

-o, --output-file

TSV to print k-mer frequencies. If “-”, will write to stdout.

Default: “-”

-q, --quality-threshold

Minimum mapping quality threshold.

Default: 20

-w, --workers

Number of worker processes.

Default: 1

-v, --verbose

Enable verbose mode to display detailed processing information.

Default: 0

interval-end-motifs#

Measures frequency of k-mer 5’ end motifs in each region specified in a BED file and writes data into a table.

finaletoolkit interval-end-motifs [-h] [-k K] [-min MIN_LENGTH]
                                  [-max MAX_LENGTH] [-lo MIN_LENGTH]
                                  [-hi MAX_LENGTH] [-B] [-n]
                                  [-o OUTPUT_FILE] [-q QUALITY_THRESHOLD]
                                  [-w WORKERS] [-v]
                                  input_file refseq_file intervals

Positional Arguments#

input_file

Path to a BAM/CRAM/Fragment file containing fragment data.

refseq_file

The .2bit file for the associate reference genome sequence used during alignment.

intervals

Path to a BED file containing intervals to retrieve end motif frequencies over.

Named Arguments#

-k

Length of k-mer.

Default: 4

-min, --min-length

Minimum length for a fragment to be included.

Default: 0

-max, --max-length

Maximum length for a fragment to be included.

-lo, --fraction-low

Deprecated alias for –min-length

-hi, --fraction-high

Deprecated alias for –max-length

-B, --single-strand

Set flag to only consider one strand for end-motifs. By default, the positive strand is calculated, but with the -n flag, the 5’ end motifs of the negative strand are considered instead.

Default: True

-n, --negative-strand

Set flag in conjunction with -B to only consider 5’ end motifs on the negative strand.

Default: False

-o, --output-file

Path to TSV or CSV file to write end motif frequencies to.

Default: “-”

-q, --quality-threshold

Minimum mapping quality threshold.

Default: 20

-w, --workers

Number of worker processes.

Default: 1

-v, --verbose

Enable verbose mode to display detailed processing information.

Default: 0

mds#

Reads k-mer frequencies from a file and calculates a motif diversity score (MDS) using normalized Shannon entropy as described by Jiang et al (2020).

finaletoolkit mds [-h] [-s SEP] [--header HEADER] [file_path]

Positional Arguments#

file_path

Tab-delimited or similar file containing one column for all k-mers a one column for frequency. Reads from stdin by default.

Default: “-”

Named Arguments#

-s, --sep

Separator used in tabular file.

Default: “ “

--header

Number of header rows to ignore. Default is 0

Default: 0

interval-mds#

Reads k-mer frequencies from a file and calculates a motif diversity score (MDS) for each interval using normalized Shannon entropy as described by Jiang et al (2020).

finaletoolkit interval-mds [-h] [-s SEP] [--header HEADER]
                           [file_path] file_out

Positional Arguments#

file_path

Tab-delimited or similar file containing one column for all k-mers a one column for frequency. Reads from stdin by default.

Default: “-”

file_out

Path to the output BED/BEDGraph file containing MDS for each interval.

Default: “-”

Named Arguments#

-s, --sep

Separator used in tabular file.

Default: “ “

--header

Number of header rows to ignore. Default is 0

Default: 0

filter-bam#

Filters a BAM file so that all reads are in mapped pairs, exceed a certain MAPQ, are not flagged for quality, are read1, are not secondary or supplementary alignments, and are on the same reference sequence as the mate.

finaletoolkit filter-bam [-h] [-r REGION_FILE] [-o OUTPUT_FILE]
                         [-q QUALITY_THRESHOLD] [-min MIN_LENGTH]
                         [-max MAX_LENGTH] [-lo MIN_LENGTH]
                         [-hi MAX_LENGTH] [-w WORKERS] [-v]
                         input_file

Positional Arguments#

input_file

Path to BAM file.

Named Arguments#

-r, --region-file

Only output alignments overlapping the intervals in this BED file will be included.

-o, --output-file

Output BAM file path.

Default: “-”

-q, --quality-threshold

Minimum mapping quality threshold.

Default: 30

-min, --min-length

Minimum length for a fragment to be included.

-max, --max-length

Maximum length for a fragment to be included.

-lo, --fraction-low

Deprecated alias for –min-length

-hi, --fraction-high

Deprecated alias for –max-length

-w, --workers

Number of worker processes.

Default: 1

-v, --verbose

Enable verbose mode to display detailed processing information.

agg-bw#

Aggregates a bigWig signal over constant-length intervals defined in a BED file.

finaletoolkit agg-bw [-h] [-o OUTPUT_FILE] [-m MEDIAN_WINDOW_SIZE] [-a]
                     [-v]
                     input_file interval_file

Positional Arguments#

input_file

A bigWig file containing signals over the intervals specified in interval file.

interval_file

Path to a BED file containing intervals over which signals were calculated over.

Named Arguments#

-o, --output-file

A wiggle file containing the aggregate signal over the intervals specified in interval file.

Default: “-”

-m, --median-window-size

Size of the median filter window used to aggregate scores. Set to 120 if aggregating WPS signals.

Default: 1

-a, --mean

use mean instead

Default: False

-v, --verbose

Enable verbose mode to display detailed processing information.

gap-bed#

Creates a BED4 file containing centromeres, telomeres, and short-arm intervals, similar to the gaps annotation track for hg19 found on the UCSC Genome Browser (Kent et al 2002). Currently only supports hg19, b37, human_g1k_v37, hg38, and GRCh38

finaletoolkit gap-bed [-h]
                      {hg19,b37,human_g1k_v37,hg38,GRCh38} output_file

Positional Arguments#

reference_genome

Possible choices: hg19, b37, human_g1k_v37, hg38, GRCh38

Reference genome to provide gaps for.

output_file

Path to write BED file to. If “-” used, writes to stdout.

Gap is used liberally in this command, and in the case hg38/GRCh38, may refer to regions where there no longer are gaps in the reference sequence.