Frag File Utilities#

finaletoolkit.utils.filter_bam(input_file: str, region_file: str = None, output_file: str = None, max_length: int = None, min_length: int = None, quality_threshold: int = 30, workers: int = 1, verbose: bool = False)#

Accepts the path to a BAM file and creates a bam file where all reads are read1 in a proper pair, exceed the specified quality threshold, do not intersect a region in the given blacklist file, and intersects with a region in the region bed.

Parameters#

input_bamstr

Path string or AlignmentFile pointing to the BAM file to be filtered.

region_file : str, option output_file : str, optional min_length : int, optional max_length : int, optional quality_threshold : int, optional workers : int, optional verbose : bool, optional

Returns#

output_file : str

finaletoolkit.utils.genome2list(genome_file: str) list#

Reads a GENOME text file into a list of tuples (chrom, length)

Parameters#

genome_filestr

String containing path to GENOME format file

Returns#

chromsstr

List of tuples containing chrom/contig names and lengths

finaletoolkit.utils.agg_bw(input_file: str, interval_file: str, output_file: str, median_window_size: int = 0, mean: bool = False, strand_location: int = 5, verbose: bool = False)#

Takes a BigWig and an interval BED and aggregates signal along the intervals.

For aggregating WPS signals, note that the median filter trims the ends of each interval by half of the window size of the filter while adjusting data. There are two way this can be approached in aggregation:

1. supply an interval file containing smaller intervals. e.g. if you used 5kb intervals for WPS and used a median filter window of 1kb, supply a BED file with 4kb windows to this function.

2. provide the size of the median filter window in median_window_size along with the original intervals. e.g if 5kb intervals were used for WPS and a 1kb median filter window was used, supply the 5kb bed file and median filter window size to this function.

Do not do both of these at once.

Parameters#

input_file : str interval_file : str output_file : str median_window_size : int, optional

default is 0

meanbool

use mean instead

strand_locationint

which column (starting at 0) of the interval file contains the strand. Default is 5.

verboseint or bool, optional

default is False

Return#

agg_scores : NDArray