I/O Wrappers

Unified access to reference genomes and alignment/fragment files. These back every feature and can be used directly.

class finaletoolkit.io.ReferenceWrapper(reference_path, use_lock=True)[source]

Wrap a 2bit- or FASTA-formatted reference genome for sequence queries.

The wrapper encapsulates the py2bit/pysam handle logic behind a common interface and may be used as a context manager.

Parameters:
  • reference_path (str or Path) – Path to a .2bit/.tb2 or FASTA (.fa/.fasta/.fna, optionally .gz) reference file.

  • use_lock (bool, optional) – If True (default), guard sequence queries with a threading.Lock so a single instance may be shared across threads. Set False for per-thread/per-process instances to avoid lock overhead.

Raises:

FileNotFoundError – If reference_path does not exist.

property chroms: Dict[str, int]

Mapping of chromosome name to length.

sequence(contig, start=None, stop=None, fail_on_excess_range=True)[source]

Return the upper-cased reference sequence for a region.

Parameters:
  • contig (str) – Chromosome or contig name.

  • start (int, optional) – 0-based start position (defaults to 0).

  • stop (int, optional) – 0-based, exclusive end position (defaults to the contig length).

  • fail_on_excess_range (bool, optional) – If True (default), raise when the range exceeds the contig bounds; otherwise truncate the range to the contig.

Returns:

The upper-cased sequence (empty string if the truncated range is empty).

Return type:

str

Raises:
close()[source]

Safely close the underlying file handle.

Return type:

None

class finaletoolkit.io.AlignmentWrapper(path, reference_file=None, threads=1, quality_threshold=30, read1_only=True)[source]

A unified interface for reading alignment and fragment data.

Wraps BAM, CRAM, SAM, and tabix-indexed fragment files (frag.gz and BED6 bed.gz), exposing a consistent fetch() generator and contig table.

Parameters:
  • path (str, Path, pysam.AlignmentFile, or pysam.TabixFile) – Path to the input file, or an already-open pysam handle.

  • reference_file (str or Path, optional) – Reference genome path (required for CRAM input).

  • threads (int, optional) – Decompression threads for BAM/CRAM (default 1).

  • quality_threshold (int, optional) – Minimum mapping quality for emitted fragments (default 30).

  • read1_only (bool, optional) – If True (default), only read1 is used from BAM/CRAM to avoid double-counting fragments. Has no effect on tabix files.

Raises:
property chroms: Dict[str, int | None]

Mapping of contig name to length (None for tabix files).

property is_sam: bool

True if the source is a SAM/BAM/CRAM file.

fetch(contig=None, start=None, stop=None)[source]

Yield Fragment records overlapping a region.

Parameters:
  • contig (str, optional) – Contig to restrict to (None for the whole file).

  • start (int, optional) – Region bounds passed to the underlying index query.

  • stop (int, optional) – Region bounds passed to the underlying index query.

Yields:

Fragment – Standardized fragment records passing the quality filter.

Return type:

Generator[Fragment, None, None]

close()[source]

Close the underlying handle if this wrapper owns it.

Return type:

None

class finaletoolkit.io.Fragment(contig, start, stop, mapq, is_forward)[source]

A zero-cost, standardized representation of a genomic fragment.

The same record type is produced whether the data came from a BAM/CRAM record or a tabix/BED line, so downstream code never branches on format.

Parameters:
contig

Chromosome/contig name.

Type:

str

start

0-based, inclusive fragment start.

Type:

int

stop

0-based, exclusive fragment stop.

Type:

int

mapq

Mapping quality.

Type:

int

is_forward

True if the fragment is on the + strand.

Type:

bool