Genome Utilities#
- class finaletoolkit.genome.GenomeGaps(gaps_bed: Path | str = None)#
Reads telomere, centromere, and short_arm intervals from a bed file or generates these intervals from UCSC gap and centromere tracks for hg19 and hg38.
- classmethod b37()#
Creates a GenomeGaps for the Broad Institute GRCh37 reference genome i.e b37. This reference genome is also based on GRCh37, but differs from the UCSC hg19 reference in a few ways, including the absence of the ‘chr’ prefix. We generate this GenomeGap using an ad hoc method where we take the UCSC hg19 gap track and drop ‘chr’ from the chromosome names. Because there are other differences between hg19 and b37, this is not a perfect solution.
Returns#
- gapsGenomeGaps
GenomeGaps for the b37 reference genome.
- get_arm(contig: str, start: int, stop: int) str #
Returns the chromosome arm the interval is in. If in the short arm of an acrocentric chromosome or intersects a centromere, returns an empty string.
- contigstr
Chromosome of interval.
- startint
Start of interval.
- stopint
End of interval.
Returns#
- armstr
Arm that interval is in.
Raises#
- ValueError
Raised for invalid coordinates
- get_contig_gaps(contig: str) ContigGaps #
Creates a ContigGaps for the specified chromosome
Parameters#
- contigstr
Chromosome to make ContigGaps for
Returns#
- ContigGaps
Contains centromere and telomere intervals for chromosome
- classmethod hg38()#
Creates a GenomeGaps for the hg38 reference genome. This sequences uses chromosome names that start with ‘chr’ and is synonymous with the GRCh38 reference genome. Returns ——- gaps : GenomeGaps
GenomeGaps for the hg38 reference genome.
- in_tcmere(contig: str, start: int, stop: int) bool #
Checks if specified interval is in a centromere or telomere
Parameters#
- contigstr
Chromosome name
- startint
Start of interval
- stopint
End of interval
Returns#
- in_telomere_or_centromerebool
True if in a centromere or telomere
- overlaps_gap(contig: str, start: int, stop: int) bool #
Checks if specified interval overlaps a gap interval
Parameters#
- contigstr
Chromosome name
- startint
Start of interval
- stopint
End of interval
Returns#
- in_telomere_or_centromerebool
True if in a centromere or telomere
- class finaletoolkit.genome.ContigGaps(contig: str, centromere: Tuple[int, int], telomeres: Iterable[Tuple[int, int]], has_short_arm: bool = False)#
- get_arm(start: int, stop: int)#
Returns name of chromosome arm the interval is in. Returns “NOARM” if in a centromere, telomere, or short arm of an acrocentric chromosome.
Parameters#
- startint
Start of interval.
- stopint
End of interval.
Returns#
- str
Name of the chromosome arm.
Raises#
- ValueError
Raised if invalid coordinates are given.
- finaletoolkit.genome.ucsc_hg19_gap_bed(output_file: str)#
Creates BED4 of centromeres, telomeres, and short arms for the UCSC hg19 reference sequence.
Parameters#
- output_filestr
Output path