DELFI#
- finaletoolkit.frag.delfi(input_file: str, chrom_sizes: str, bins_file: str, reference_file: str, blacklist_file: str | None = None, gap_file: str | GenomeGaps | None = None, output_file: str | None = None, gc_correct: bool = True, remove_nocov: bool = True, merge_bins: bool = True, window_size: int = 5000000, quality_threshold: int = 30, workers: int = 1, verbose: int | bool = False) DataFrame #
A function that replicates the methodology of Christiano et al (2019).
- Parameters:
input_file (str) – Path string pointing to a bam file containing PE fragment reads.
chrom_sizes (str) – Path string to a chrom.sizes file containing only autosomal chromosomes
bins_file (str) – Path string to a BED file containing 100kb bins for reference genome of choice.
reference_file (str) – Path string to .2bit file for reference genome.
gap_file (str or GenomeGaps) – Specifies locations of telomeres and centromeres for reference genome. There are three options: - Path string to a BED4+ file where each interval is a centromere or telomere. A bed file can be used only if the fourth field for each entry corresponding to a telomere or centromere is labled “telomere” or “centromere, respectively. - String naming reference genome used. Options are “b37”, “hg19”, “hg38”, and “GRCh38”. - Alternatively, a finaletoolkit.genome.GenomeGaps with gap info associated with the reference genome of choice may be used.
blacklist_file (str) – Path string to BED file containing genome blacklist regions.
output_file (str, optional) – Path to output tsv.
gc_correct (bool) – Perform gc-correction. Default is True.
remove_nocov (bool) – Remove two windows described by Cristiano et al (2019) as low coverage. These windows might not apply to reference genomes other than hg19. Default is True.
merge_bins (bool) – Perform merging from 100kb bins to 5Mb bins. Default is True.
window_size (int) – Size (in bases) of non-overlapping windows to cover genome. Default is 5000000.
workers (int, optional) – Number of worker processes to use. Default is 1.
verbose (int or bool, optional) – Determines how many print statements and loading bars appear in stdout. Default is False.
- Returns:
Results of delfi analysis, with column names corresponding to those generated by the original author’s scripts.
- Return type:
pandas DataFrame
- finaletoolkit.frag.delfi_gc_correct(windows: DataFrame, alpha: float = 0.75, it: int = 8, verbose: bool = False)#
Helper function that takes window data and performs GC adjustment.
- finaletoolkit.frag.delfi_merge_bins(hundred_kb_bins: DataFrame, gc_corrected: bool = True, verbose: bool = False) DataFrame #