Puffin-D

Welcome! Puffin-D is a deep learning model that predicts basepair-resolution transcription initiation signals from 100kb sequence context. The model was trained to predict transcription initiation signals in humans for five datasets: techniques that measure mostly mature transcripts by precisely capturing their 5’ (two variants of CAGE and RAMPAGE) and two techniques that measure only nascent transcripts (GRO/PRO-cap). The model predictions correspond to basepair-resolution count profiles averaged across samples after applying \(log_{10}(𝑥 + 1)\) transformation where 𝑥 is the read count. The model can take any 100Kbp long genome sequence and outputs basepair-resolution five types of transcription initiation signals for the entire 100kbp region. The model can be used to study mutation effects on transcription initiation and analyze the sequence basis of any transcription start site. To use the Puffin-D from the command line visit our github. Puffin-D is the prediction-focused model of two complementary models that we developed for decoding the sequence-basis of transcription initiation. The interpretation-focused model Puffin, an explainable sequence model that provides basepair-level and motif-level interpretation, is available interactively at the Puffin webserver. Puffin is completely free for any non-commercial and academic use, and please contact us for commercial applications.


Prediction Target

Prediction Input

In the home page, sequence input information is expressed in a simple SeqStr format,and then submitted to our job queue. The model can make predictions for up to 10 input sequences at once. Every sequence should start with a name on a new line. To predict any reference genome regions it is enough to specify the sequence coordinates and the name of genome assembly (hg38 or hg19). Sequences carrying any mutations can also be specified in SeqStr format. It is important to note that input sequences longer than 100Kbp the prediction will be made only for the central 100Kbp region. Here we list the SeqStr patterns that we currently support:

Example 1:

<wildtype>[hg38]chr7:5480600-5580600 -
<mutant>[hg38]chr7:5480600-5580600 -, @chr7 5530626 TATA GCGC

Example 2:

the input sequence is constructed from separate sequence chunk are concatenated together:
[hg38]chr7:5530575-5630625 -,@chr7 5530575 C T,@chr7 5530576 GC A;TTAAccggGGNaa;[hg38]chrX:1000000-1000017 +;TTAA;

Prediction Output

Puffin-D predicts basepair-resolution transcription initiation signals using only genomic sequence as input, and more importantly analyze the sequence basis of any transcription start site (TSS) at motif and basepair levels. The predictions are visualized with the plot of signals for target of FANTOM_CAGE, ENCODE_CAGE, ENCODE_RAMPAGE, GRO_CAP, PRO_CAP

You may also download the numerical predictions in json format.

Question and feedback?

Thank you for using Puffin-D. If you have any question or feedback, you can let us know at https://github.com/jzhoulab/puffin/discussions.