Basic Features#
- finaletoolkit.frag.coverage(input_file: str | AlignmentFile, interval_file: str, output_file: str, scale_factor: float = 1000000.0, quality_threshold: int = 30, workers: int = 1, verbose: bool | int = False)#
Return estimated fragment coverage over intervals specified in intervals. Fragments are read from input_file which may be a SAM, BAM, CRAM, or Frag.gz file. Uses an algorithm where the midpoints of fragments are calculated and coverage is tabulated from the midpoints that fall into the specified region. Not suitable for fragments of size approaching interval size.
Parameters#
- input_filestr or pysam.AlignmentFile
SAM, BAM, CRAM, or Frag.gz file containing paired-end fragment reads or its path. AlignmentFile must be opened in read mode.
- interval_filestr
BED4 file containing intervals over which to generate coverage statistics.
- output_filestring, optional
Path for bed file to print coverages to. If output_file = _, results will be printed to stdout.
- scale_factorint, optional
Amount to multiply coverages by. Default is 10^6.
quality_threshold : int, optional verbose : int or bool, optional
Returns#
- coverageint
Fragment coverage over contig and region.
- finaletoolkit.frag.frag_length(input_file: str | AlignmentFile | TabixFile, contig: str = None, start: int = None, stop: int = None, intersect_policy: str = 'midpoint', output_file: str = None, quality_threshold: int = 30, verbose: bool = False) ndarray #
Return np.ndarray containing lengths of fragments in input_file that are above the quality threshold and are proper-paired reads.
Parameters#
- input_filestr or pysam.AlignmentFile
BAM, SAM, or CRAM file containing paired-end fragment reads or its path. AlignmentFile must be opened in read mode.
- contigstring, optional
Contig or chromosome to get fragments from
- startint, optional
0-based left-most coordinate of interval
- stopint, optional
1-based right-most coordinate of interval
- intersect_policystr, optional
Specifies what policy is used to include fragments in the given interval. Default is “midpoint”. Policies include: - midpoint: the average of end coordinates of a fragment lies in the interval. - any: any part of the fragment is in the interval.
output_file : string, optional quality_threshold : int, optional verbose : bool, optional
Returns#
- lengthsnumpy.ndarray
ndarray of fragment lengths from file and contig if specified.
- finaletoolkit.frag.frag_length_bins(input_file: str | AlignmentFile, contig: str = None, start: int = None, stop: int = None, bin_size: int = None, output_file: str = None, contig_by_contig: bool = False, histogram: bool = False, intersect_policy: str = 'midpoint', quality_threshold: int = 30, verbose: bool | int = False) Tuple[ndarray, ndarray] #
Takes input_file, computes frag lengths of fragments and returns two arrays containing bins and counts by size. Optionally prints data to output as a tab delimited table or histogram.
Parameters#
input_file : str or AlignmentFile contig : str, optional start : int, optional stop : int, optional bin_size : int, optional output_file : str, optional contig_by_contig: bool, optional histogram: bool, optional intersect_policy : str, optional
Specifies what policy is used to include fragments in the given interval. Default is “midpoint”. Policies include: - midpoint: the average of end coordinates of a fragment lies in the interval. - any: any part of the fragment is in the interval.
workers : int, optional
Returns#
bins : ndarray counts : ndarray