Basic Features#

finaletoolkit.frag.coverage(input_file: str | TabixFile | AlignmentFile | Path, interval_file: str, output_file: str, scale_factor: float = 1000000.0, intersect_policy: str = 'midpoint', quality_threshold: int = 30, workers: int = 1, verbose: bool | int = False) Iterable[tuple[str, int, int, str, float]]#

Return estimated fragment coverage over intervals specified in intervals. Fragments are read from input_file which may be a SAM, BAM, CRAM, or Frag.gz file. Uses an algorithm where the midpoints of fragments are calculated and coverage is tabulated from the midpoints that fall into the specified region. Not suitable for fragments of size approaching interval size.

Parameters:
  • input_file (str or pysam.AlignmentFile) – SAM, BAM, CRAM, or Frag.gz file containing paired-end fragment reads or its path. AlignmentFile must be opened in read mode.

  • interval_file (str) – BED4 file containing intervals over which to generate coverage statistics.

  • output_file (string, optional) – Path for bed file to print coverages to. If output_file = -, results will be printed to stdout.

  • scale_factor (int, optional) – Amount to multiply coverages by. Default is 10^6.

  • intersect_policy (str, optional) – Specifies how to determine whether fragments are in interval. ‘midpoint’ (default) calculates the central coordinate of each fragment and only selects the fragment if the midpoint is in the interval. ‘any’ includes fragments with any overlap with the interval.

  • quality_threshold (int, optional) – Minimum MAPQ. Default is 30.

  • workers (int, optional) – Number of subprocesses to spawn. Increases speed at the expense of memory.

  • verbose (int or bool, optional)

Returns:

coverages – Fragment coverages over intervals.

Return type:

Iterable[tuple[str, int, int, str, float]]

finaletoolkit.frag.frag_length(input_file: str | AlignmentFile | TabixFile, contig: str | None = None, start: int | None = None, stop: int | None = None, intersect_policy: str = 'midpoint', output_file: str | None = None, quality_threshold: int = 30, verbose: bool = False) ndarray#

Return np.ndarray containing lengths of fragments in input_file that are above the quality threshold and are proper-paired reads.

Parameters:
  • input_file (str or pysam.AlignmentFile) – BAM, SAM, or CRAM file containing paired-end fragment reads or its path. AlignmentFile must be opened in read mode.

  • contig (string, optional) – Contig or chromosome to get fragments from

  • start (int, optional) – 0-based left-most coordinate of interval

  • stop (int, optional) – 1-based right-most coordinate of interval

  • intersect_policy (str, optional) – Specifies what policy is used to include fragments in the given interval. Default is “midpoint”. Policies include: - midpoint: the average of end coordinates of a fragment lies in the interval. - any: any part of the fragment is in the interval.

  • output_file (string, optional)

  • quality_threshold (int, optional)

  • verbose (bool, optional)

Returns:

lengthsndarray of fragment lengths from file and contig if specified.

Return type:

numpy.ndarray

finaletoolkit.frag.frag_length_bins(input_file: str | AlignmentFile, contig: str | None = None, start: int | None = None, stop: int | None = None, bin_size: int | None = None, output_file: str | None = None, contig_by_contig: bool = False, histogram: bool = False, intersect_policy: str = 'midpoint', quality_threshold: int = 30, verbose: bool | int = False) tuple[ndarray, ndarray]#

Takes input_file, computes frag lengths of fragments and returns two arrays containing bins and counts by size. Optionally prints data to output as a tab delimited table or histogram.

Parameters:
  • input_file (str or AlignmentFile)

  • contig (str, optional)

  • start (int, optional)

  • stop (int, optional)

  • bin_size (int, optional)

  • output_file (str, optional)

  • contig_by_contig (bool, optional)

  • histogram (bool, optional)

  • intersect_policy (str, optional) – Specifies what policy is used to include fragments in the given interval. Default is “midpoint”. Policies include: - midpoint: the average of end coordinates of a fragment lies in the interval. - any: any part of the fragment is in the interval.

  • workers (int, optional)

Returns:

  • bins (ndarray)

  • counts (ndarray)