I/O Wrappers¶
Unified access to reference genomes and alignment/fragment files. These back every feature and can be used directly.
- class finaletoolkit.io.ReferenceWrapper(reference_path, use_lock=True)[source]¶
Wrap a 2bit- or FASTA-formatted reference genome for sequence queries.
The wrapper encapsulates the py2bit/pysam handle logic behind a common interface and may be used as a context manager.
- Parameters:
reference_path (str or Path) – Path to a
.2bit/.tb2or FASTA (.fa/.fasta/.fna, optionally.gz) reference file.use_lock (bool, optional) – If
True(default), guard sequence queries with athreading.Lockso a single instance may be shared across threads. SetFalsefor per-thread/per-process instances to avoid lock overhead.
- Raises:
FileNotFoundError – If
reference_pathdoes not exist.
- sequence(contig, start=None, stop=None, fail_on_excess_range=True)[source]¶
Return the upper-cased reference sequence for a region.
- Parameters:
contig (str) – Chromosome or contig name.
start (int, optional) – 0-based start position (defaults to 0).
stop (int, optional) – 0-based, exclusive end position (defaults to the contig length).
fail_on_excess_range (bool, optional) – If
True(default), raise when the range exceeds the contig bounds; otherwise truncate the range to the contig.
- Returns:
The upper-cased sequence (empty string if the truncated range is empty).
- Return type:
- Raises:
ContigNotFoundError – If
contigis absent from the reference.OutOfBoundsError – If the range is out of bounds and
fail_on_excess_rangeis True.
- class finaletoolkit.io.AlignmentWrapper(path, reference_file=None, threads=1, quality_threshold=30, read1_only=True)[source]¶
A unified interface for reading alignment and fragment data.
Wraps BAM, CRAM, SAM, and tabix-indexed fragment files (
frag.gzand BED6bed.gz), exposing a consistentfetch()generator and contig table.- Parameters:
path (str, Path, pysam.AlignmentFile, or pysam.TabixFile) – Path to the input file, or an already-open pysam handle.
reference_file (str or Path, optional) – Reference genome path (required for CRAM input).
threads (int, optional) – Decompression threads for BAM/CRAM (default 1).
quality_threshold (int, optional) – Minimum mapping quality for emitted fragments (default 30).
read1_only (bool, optional) – If
True(default), only read1 is used from BAM/CRAM to avoid double-counting fragments. Has no effect on tabix files.
- Raises:
FileNotFoundError – If the file or a required index is missing.
UnsupportedFormatError – If the file extension is not recognized.