Input Data#

FinaleToolkit is compatible with almost any paired-end sequence data.

SAM#

A sequence alignment map (SAM) file is a human-readable file format that stores the results of sequence alignment. It contains information about the alignment of each read to the reference genome, as well as information about the read itself.

BAM#

A binary alignment file (BAM) provides the same information as a SAM file, but in a binary format. This can be useful for saving space on disk, but is not human-readable.

FinaleToolkit requires that BAM files be BAI indexed. Therefore, you should have an associated .bam.bai file in the same directory of your input data.

CRAM#

A compressed read alignment map file is a compressed version of a SAM file. It is a binary file that is smaller than a BAM file, but still contains all of the same information.

FinaleToolkit requires that CRAM files be CRAI indexed. Therefore, you should have an associated .cram.crai file in the same directory of your input data.

Fragment File#

A fragment file (.frag.gz) file that is derived from information in a BAM file. A fragment file is a block-gzipped BED3+2 file (similar to a tab-separated value file) that contains the following columns (with one row entry for each fragment): chrom, start, stop, mapq, and strand.

Here, mapq is the mapping quality of the fragment, and strand is the strand of the fragment. The strand column can be either + or -.

FinaleToolkit requires that fragment files be Tabix indexed. Therefore, you should have an associated .frag.gz.tbi file in the same directory of your input data.

For your reference, here is an example fragment file:

#chrom    start    stop    mapq    strand
chr1    10000    10050    60    +
chr1    10050    10100    60    -
chr1    10100    10150    60    +
chr1    10150    10200    60    -
chr1    10200    10250    60    +
chr1    10250    10300    60    -
chr1    10300    10350    60    +
chr1    10350    10400    60    -
chr1    10400    10450    60    +
chr1    10450    10500    60    -

We encourage you to use our comprehensive database, FinaleDB, to access relevant fragment files. Learn more about FinaleDB here .