Basic Features#

finaletoolkit.frag.coverage(input_file: str | AlignmentFile, interval_file: str, output_file: str, scale_factor: float = 1000000.0, quality_threshold: int = 30, workers: int = 1, verbose: bool | int = False)#

Return estimated fragment coverage over intervals specified in intervals. Fragments are read from input_file which may be a SAM, BAM, CRAM, or Frag.gz file. Uses an algorithm where the midpoints of fragments are calculated and coverage is tabulated from the midpoints that fall into the specified region. Not suitable for fragments of size approaching interval size.

Parameters#

input_filestr or pysam.AlignmentFile: SAM, BAM, CRAM, or Frag.gz file containing paired-end fragment reads or its path. AlignmentFile must be opened in read mode.
interval_filestr: BED4 file containing intervals over which to generate coverage statistics.
output_filestring, optional: Path for bed file to print coverages to. If output_file = _, results will be printed to stdout.
scale_factorint, optional: Amount to multiply coverages by. Default is 10^6.

quality_threshold : int, optional verbose : int or bool, optional

Returns#

coverageint: Fragment coverage over contig and region.

finaletoolkit.frag.frag_length(input_file: str | AlignmentFile | TabixFile, contig: str = None, start: int = None, stop: int = None, intersect_policy: str = 'midpoint', output_file: str = None, quality_threshold: int = 30, verbose: bool = False) → ndarray#

Return np.ndarray containing lengths of fragments in input_file that are above the quality threshold and are proper-paired reads.

Parameters#

input_filestr or pysam.AlignmentFile: BAM, SAM, or CRAM file containing paired-end fragment reads or its path. AlignmentFile must be opened in read mode.
contigstring, optional: Contig or chromosome to get fragments from
startint, optional: 0-based left-most coordinate of interval
stopint, optional: 1-based right-most coordinate of interval
intersect_policystr, optional: Specifies what policy is used to include fragments in the given interval. Default is “midpoint”. Policies include: - midpoint: the average of end coordinates of a fragment lies in the interval. - any: any part of the fragment is in the interval.

output_file : string, optional quality_threshold : int, optional verbose : bool, optional

Returns#

lengthsnumpy.ndarray: ndarray of fragment lengths from file and contig if specified.

finaletoolkit.frag.frag_length_bins(input_file: str | AlignmentFile, contig: str = None, start: int = None, stop: int = None, bin_size: int = None, output_file: str = None, contig_by_contig: bool = False, histogram: bool = False, intersect_policy: str = 'midpoint', quality_threshold: int = 30, verbose: bool | int = False) → Tuple[ndarray, ndarray]#: Takes input_file, computes frag lengths of fragments and returns two arrays containing bins and counts by size. Optionally prints data to output as a tab delimited table or histogram.

Parameters#

input_file : str or AlignmentFile contig : str, optional start : int, optional stop : int, optional bin_size : int, optional output_file : str, optional contig_by_contig: bool, optional histogram: bool, optional intersect_policy : str, optional

Specifies what policy is used to include fragments in the given interval. Default is “midpoint”. Policies include: - midpoint: the average of end coordinates of a fragment lies in the interval. - any: any part of the fragment is in the interval.

workers : int, optional

Returns#

bins : ndarray counts : ndarray