DELFI#
- finaletoolkit.frag.delfi(input_file: str, autosomes: str, bins_file: str, reference_file: str, blacklist_file: str = None, gap_file: Union(str, GenomeGaps) = None, output_file: str = None, gc_correct: bool = True, merge_bins: bool = True, window_size: int = 5000000, subsample_coverage: float = 2, quality_threshold: int = 30, workers: int = 1, preprocessing: bool = True, verbose: int | bool = False) pandas.DataFrame #
A function that replicates the methodology of Christiano et al (2019).
Parameters#
- input_file: str
Path string pointing to a bam file containing PE fragment reads.
- autosomes: str
Path string to a .genome file containing only autosomal chromosomes
- bins_file: str
Path string to a BED file containing 100kb bins for reference genome of choice. Cristiano et al uses
- reference_file: str
Path string to .2bit file.
- blacklist_file: str
Path string to BED file containing genome blacklist.
- gap_file: str
Path string to a BED4+ file where each interval is a centromere or telomere. A bed file can be used only if the fourth field for each entry corresponding to a telomere or centromere is labled “telomere” or “centromere, respectively.
- output_file: str, optional
Path to output tsv.
- window_size: int
Size of non-overlapping windows to cover genome. Default is 5 megabases.
- subsample_coverage: int, optional
The depth at which to subsample the input_bam. Default is 2.
- workers: int, optional
Number of worker processes to use. Default is 1.
- preprocessing: bool, optional
Christiano et al (2019)
- verbose: int or bool, optional
Determines how many print statements and loading bars appear in stdout. Default is False.
- finaletoolkit.frag.delfi_gc_correct(windows: DataFrame, alpha: float = 0.75, it: int = 8, verbose: bool = False)#
Helper function that takes window data and performs GC adjustment.
- finaletoolkit.frag.delfi_merge_bins(hundred_kb_bins: DataFrame, gc_corrected: bool = True, add_chr: bool = False, verbose: bool = False)#