Basic Features#

finaletoolkit.frag.coverage(input_file: str | AlignmentFile, interval_file: str, output_file: str, scale_factor: float = 1000000.0, quality_threshold: int = 30, workers: int = 1, verbose: bool | int = False)#

Return estimated fragment coverage over intervals specified in intervals. Fragments are read from input_file which may be a SAM, BAM, CRAM, or Frag.gz file. Uses an algorithm where the midpoints of fragments are calculated and coverage is tabulated from the midpoints that fall into the specified region. Not suitable for fragments of size approaching interval size.

Parameters#

input_filestr or pysam.AlignmentFile

SAM, BAM, CRAM, or Frag.gz file containing paired-end fragment reads or its path. AlignmentFile must be opened in read mode.

interval_filestr

BED4 file containing intervals over which to generate coverage statistics.

output_filestring, optional

Path for bed file to print coverages to. If output_file = _, results will be printed to stdout.

scale_factorint, optional

Amount to multiply coverages by. Default is 10^6.

quality_threshold : int, optional verbose : int or bool, optional

Returns#

coverageint

Fragment coverage over contig and region.

finaletoolkit.frag.frag_length(input_file: str | AlignmentFile | TabixFile, contig: str = None, start: int = None, stop: int = None, intersect_policy: str = 'midpoint', output_file: str = None, quality_threshold: int = 30, verbose: bool = False) ndarray#

Return np.ndarray containing lengths of fragments in input_file that are above the quality threshold and are proper-paired reads.

Parameters#

input_filestr or pysam.AlignmentFile

BAM, SAM, or CRAM file containing paired-end fragment reads or its path. AlignmentFile must be opened in read mode.

contigstring, optional

Contig or chromosome to get fragments from

startint, optional

0-based left-most coordinate of interval

stopint, optional

1-based right-most coordinate of interval

intersect_policystr, optional

Specifies what policy is used to include fragments in the given interval. Default is “midpoint”. Policies include: - midpoint: the average of end coordinates of a fragment lies in the interval. - any: any part of the fragment is in the interval.

output_file : string, optional quality_threshold : int, optional verbose : bool, optional

Returns#

lengthsnumpy.ndarray

ndarray of fragment lengths from file and contig if specified.

finaletoolkit.frag.frag_length_bins(input_file: str | AlignmentFile, contig: str = None, start: int = None, stop: int = None, bin_size: int = None, output_file: str = None, contig_by_contig: bool = False, histogram: bool = False, intersect_policy: str = 'midpoint', quality_threshold: int = 30, verbose: bool | int = False) Tuple[ndarray, ndarray]#

Takes input_file, computes frag lengths of fragments and returns two arrays containing bins and counts by size. Optionally prints data to output as a tab delimited table or histogram.

Parameters#

input_file : str or AlignmentFile contig : str, optional start : int, optional stop : int, optional bin_size : int, optional output_file : str, optional contig_by_contig: bool, optional histogram: bool, optional intersect_policy : str, optional

Specifies what policy is used to include fragments in the given interval. Default is “midpoint”. Policies include: - midpoint: the average of end coordinates of a fragment lies in the interval. - any: any part of the fragment is in the interval.

workers : int, optional

Returns#

bins : ndarray counts : ndarray