Window Protection Score (WPS)#

finaletoolkit.frag.wps(input_file: str | AlignmentFile, contig: str, start: int | str, stop: int | str, output_file: str | None = None, window_size: int = 120, fraction_low: int = 120, fraction_high: int = 180, quality_threshold: int = 30, verbose: bool | int = 0) ndarray#

Return (raw) Windowed Protection Scores as specified in Snyder et al (2016) over a region [start,stop).

Parameters:
  • input_file (str or pysam.AlignmentFile) – BAM, SAM or tabix file containing paired-end fragment reads or its path. AlignmentFile must be opened in read mode.

  • contig (str) –

  • start (int) –

  • stop (int) –

  • output_file (string, optional) –

  • window_size (int, optional) – Size of window to calculate WPS. Default is k = 120, equivalent to L-WPS.

  • fraction_low (int, optional) – Specifies lowest fragment length included in calculation. Default is 120, equivalent to long fraction.

  • fraction_high (int, optional) – Specifies highest fragment length included in calculation. Default is 180, equivalent to long fraction.

  • quality_threshold (int, optional) –

  • workers (int, optional) –

  • verbose (bool, optional) –

Returns:

scores – np struct array of with columns contig, start, and wps.

Return type:

numpy.ndarray

finaletoolkit.frag.multi_wps(input_file: AlignmentFile | str, site_bed: str, output_file: None | str = None, window_size: int = 120, interval_size: int = 5000, fraction_low: int = 120, fraction_high: int = 180, quality_threshold: int = 30, workers: int = 1, verbose: bool | int = 0) ndarray#

Function that aggregates WPS over sites in BED file according to the method described by Snyder et al (2016).

Parameters:
  • input_file (str or pysam.AlignmentFile) – BAM, SAM, or tabix file containing paired-end fragment reads or its path. AlignmentFile must be opened in read mode.

  • site_bed (str) – Bed file containing intervals to perform WPS on.

  • output_file (string, optional) –

  • window_size (int, optional) – Size of window to calculate WPS. Default is k = 120, equivalent to L-WPS.

  • interval_size (int, optional) – Size of each interval specified in the bed file. Should be the same for every interval. Default is 5000.

  • fraction_low (int, optional) – Specifies lowest fragment length included in calculation. Default is 120, equivalent to long fraction.

  • fraction_high (int, optional) – Specifies highest fragment length included in calculation. Default is 120, equivalent to long fraction.

  • quality_threshold (int, optional) –

  • workers (int, optional) –

  • verbose (bool, optional) –

Returns:

scores – np array of shape (n, 2) where column 1 is the coordinate and column 2 is the score and n is the number of coordinates in region [start,stop)

Return type:

numpy.ndarray

finaletoolkit.frag.adjust_wps(input_file: str, interval_file: str, output_file: str, genome_file: str, interval_size: int = 5000, median_window_size: int = 1000, savgol_window_size: int = 21, savgol_poly_deg: int = 2, mean: bool = False, subtract_edges: bool = False, edge_size: int = 500, workers: int = 1, verbose: Union(bool, int) = False)#

Adjusts raw WPS data in a BigWig by applying a median filter and Savitsky-Golay filter (Savitsky and Golay, 1964).

Parameters:
  • input_file (str) – Path string to a BigWig containing raw WPS data.

  • interval_file (str) – BED format file containing intervals over which WPS was calculated on.

  • output_file (str) – BigWig file to write adjusted WPS to.

  • genome_file (str) – The genome file for the reference genome that WGS was aligned to. A tab delimited file where column 1 contains the name of chromosomes and column 2 contains chromosome length.

  • median_window_size (int, optional) – Size of median filter window. Default is 1000.

  • savgol_window_size (int, optional) – Size of Savitsky Golay filter window. Default is 21.

  • savgol_poly_deg (int, optional) – Degree polynomial for Savitsky Golay filter. Default is 2.

  • mean (bool, optional) – If true, a mean filter is used instead of median. Default is False.

  • subtract_edges (bool, optional) – If true, take the median of the first and last 500 bases in a window and subtract from the whole interval. Default is False.

  • workers (int, optional) – Number of processes to use. Default is 1.

  • verbose (bool or int, optional) – Default is False.