Window Protection Score (WPS)#
- finaletoolkit.frag.wps(input_file: str | AlignmentFile, contig: str, start: int | str, stop: int | str, output_file: str | None = None, window_size: int = 120, fraction_low: int = 120, fraction_high: int = 180, quality_threshold: int = 30, verbose: bool | int = 0) ndarray #
Return (raw) Windowed Protection Scores as specified in Snyder et al (2016) over a region [start,stop).
- Parameters:
input_file (str or pysam.AlignmentFile) – BAM, SAM or tabix file containing paired-end fragment reads or its path. AlignmentFile must be opened in read mode.
contig (str) –
start (int) –
stop (int) –
output_file (string, optional) –
window_size (int, optional) – Size of window to calculate WPS. Default is k = 120, equivalent to L-WPS.
fraction_low (int, optional) – Specifies lowest fragment length included in calculation. Default is 120, equivalent to long fraction.
fraction_high (int, optional) – Specifies highest fragment length included in calculation. Default is 180, equivalent to long fraction.
quality_threshold (int, optional) –
workers (int, optional) –
verbose (bool, optional) –
- Returns:
scores – np struct array of with columns contig, start, and wps.
- Return type:
numpy.ndarray
- finaletoolkit.frag.multi_wps(input_file: AlignmentFile | str, site_bed: str, output_file: None | str = None, window_size: int = 120, interval_size: int = 5000, fraction_low: int = 120, fraction_high: int = 180, quality_threshold: int = 30, workers: int = 1, verbose: bool | int = 0) ndarray #
Function that aggregates WPS over sites in BED file according to the method described by Snyder et al (2016).
- Parameters:
input_file (str or pysam.AlignmentFile) – BAM, SAM, or tabix file containing paired-end fragment reads or its path. AlignmentFile must be opened in read mode.
site_bed (str) – Bed file containing intervals to perform WPS on.
output_file (string, optional) –
window_size (int, optional) – Size of window to calculate WPS. Default is k = 120, equivalent to L-WPS.
interval_size (int, optional) – Size of each interval specified in the bed file. Should be the same for every interval. Default is 5000.
fraction_low (int, optional) – Specifies lowest fragment length included in calculation. Default is 120, equivalent to long fraction.
fraction_high (int, optional) – Specifies highest fragment length included in calculation. Default is 120, equivalent to long fraction.
quality_threshold (int, optional) –
workers (int, optional) –
verbose (bool, optional) –
- Returns:
scores – np array of shape (n, 2) where column 1 is the coordinate and column 2 is the score and n is the number of coordinates in region [start,stop)
- Return type:
numpy.ndarray
- finaletoolkit.frag.adjust_wps(input_file: str, interval_file: str, output_file: str, genome_file: str, interval_size: int = 5000, median_window_size: int = 1000, savgol_window_size: int = 21, savgol_poly_deg: int = 2, mean: bool = False, subtract_edges: bool = False, edge_size: int = 500, workers: int = 1, verbose: Union(bool, int) = False)#
Adjusts raw WPS data in a BigWig by applying a median filter and Savitsky-Golay filter (Savitsky and Golay, 1964).
- Parameters:
input_file (str) – Path string to a BigWig containing raw WPS data.
interval_file (str) – BED format file containing intervals over which WPS was calculated on.
output_file (str) – BigWig file to write adjusted WPS to.
genome_file (str) – The genome file for the reference genome that WGS was aligned to. A tab delimited file where column 1 contains the name of chromosomes and column 2 contains chromosome length.
median_window_size (int, optional) – Size of median filter window. Default is 1000.
savgol_window_size (int, optional) – Size of Savitsky Golay filter window. Default is 21.
savgol_poly_deg (int, optional) – Degree polynomial for Savitsky Golay filter. Default is 2.
mean (bool, optional) – If true, a mean filter is used instead of median. Default is False.
subtract_edges (bool, optional) – If true, take the median of the first and last 500 bases in a window and subtract from the whole interval. Default is False.
workers (int, optional) – Number of processes to use. Default is 1.
verbose (bool or int, optional) – Default is False.