Window Protection Score (WPS)#

finaletoolkit.frag.wps(input_file: str | pysam.AlignmentFile, chrom: str, start: int, stop: int, chrom_size: int, output_file: str | None = None, window_size: int = 120, min_length: int = 120, max_length: int = 180, quality_threshold: int = 30, verbose: bool | int = 0, fraction_low: int | None = None, fraction_high: int | None = None) np.ndarray#

Return (raw) Windowed Protection Scores as specified in Snyder et al (2016) over a region [start,stop).

Parameters:
  • input_file (str or pysam.AlignmentFile) – BAM, CRAM or tabix file containing paired-end fragment reads or its path. AlignmentFile must be opened in read mode.

  • chrom (str)

  • start (int)

  • stop (int)

  • chrom_size (int) – Size of chrom

  • output_file (string, optional)

  • window_size (int, optional) – Size of window to calculate WPS. Default is k = 120, equivalent to L-WPS.

  • min_length (int, optional) – Specifies lowest fragment length included in calculation. Default is 120, equivalent to long WPS.

  • max_length (int, optional) – Specifies highest fragment length included in calculation. Default is 180, equivalent to long WPS.

  • quality_threshold (int, optional)

  • workers (int, optional)

  • verbose (bool, optional)

  • fraction_low (int, optional) – Deprecated alias for min_length

  • fraction_high (int, optional) – Deprecated alias for max_length

Returns:

scores – np struct array of with columns contig, start, and wps.

Return type:

numpy.ndarray

finaletoolkit.frag.multi_wps(input_file: FragFile, site_bed: Intervals, chrom_sizes: ChromSizes | None = None, output_file: str | None = None, window_size: int = 120, interval_size: int = 5000, min_length: int = 120, max_length: int = 180, quality_threshold: int = 30, workers: int = 1, verbose: bool | int = 0, fraction_low: int | None = None, fraction_high: int | None = None)#

Function that aggregates WPS over sites in BED file according to the method described by Snyder et al (2016).

Parameters:
  • input_file (str or pysam.AlignmentFile) – BAM, CRAM, or tabix file containing paired-end fragment reads or its path. AlignmentFile must be opened in read mode.

  • site_bed (str or pathlike) – BED file containing sites to perform WPS on. The intervals in this BED file should be sorted, first by contig then start. The intervals over which WPS is calculated by finding the midpoint of these sites and creating a window of window_size length centered on that midpoint.

  • chrom_sizes (str or pathlike, optional) – Tab separated file containing names and sizes of chromosomes in input_file. Required if input_file is tabix-indexed.

  • output_file (string, optional)

  • window_size (int, optional) – Size of window to calculate WPS. Default is k = 120, equivalent to L-WPS.

  • interval_size (int, optional) – Size of intervals to calculate WPS over. A mid-point is calculated for each interval in the BED file, and an interval of the specified size is used. This is helpful especially when calculating a window around a genomic feature like transcription start sites. Default is 5000.

  • min_length (int, optional) – Specifies lowest fragment length included in calculation. Default is 120, equivalent to long fraction.

  • max_length (int, optional) – Specifies highest fragment length included in calculation. Default is 120, equivalent to long fraction.

  • quality_threshold (int, optional)

  • workers (int, optional)

  • verbose (bool, optional)

  • fraction_low (int, optional) – Deprecated alias for min_length

  • fraction_high (int, optional) – Deprecated alias for max_length

Returns:

output_file – location results are stored.

Return type:

str

finaletoolkit.frag.adjust_wps(input_file: str, interval_file: str, output_file: str, chrom_sizes: str, interval_size: int = 5000, median_window_size: int = 1000, savgol_window_size: int = 21, savgol_poly_deg: int = 2, savgol: bool = True, mean: bool = False, subtract_edges: bool = False, edge_size: int = 500, workers: int = 1, verbose: Union(bool, int) = False)#

Adjusts raw WPS data in a BigWig by applying a median filter and Savitsky-Golay filter (Savitsky and Golay, 1964).

Parameters:
  • input_file (str) – Path string to a BigWig containing raw WPS data.

  • interval_file (str) – BED format file containing intervals over which WPS was calculated on.

  • output_file (str) – BigWig file to write adjusted WPS to.

  • chrom_sizes (str) – The chrom.sizes file for the reference genome (e.g. HG38) that WGS was aligned to. A tab delimited file where column 1 contains the name of chromosomes and column 2 contains chromosome length.

  • median_window_size (int, optional) – Size of median filter window. Default is 1000.

  • savgol_window_size (int, optional) – Size of Savitsky Golay filter window. Default is 21.

  • savgol_poly_deg (int, optional) – Degree polynomial for Savitsky Golay filter. Default is 2.

  • savgol (bool, optional) – Set to true to perform Savitsky-Golay filtering.

  • mean (bool, optional) – If true, a mean filter is used instead of median. Default is False.

  • subtract_edges (bool, optional) – If true, take the median of the first and last 500 bases in a window and subtract from the whole interval. Default is False.

  • edge_size (int, optional) – size of the edge subtracted from ends of window. Default is 500.

  • workers (int, optional) – Number of processes to use. Default is 1.

  • verbose (bool or int, optional) – Default is False.