Small RNA-seq mapping

Create spikein directory

# output spike-in directory
spikein_dir="data/${dataset}/spikein"
# input spike-in sequences
spikein_fa="data/${dataset}/spikein.fa"
# create sub-directories
mkdir -p "$spikein_dir"
for sub_dir in fasta bed chrom_sizes transcript_table index/bowtie2;do
    mkdir -p "$spikein_dir/$sub_dir"
done
# create files
cp "$spikein_fa" "$spikein_dir/fasta/spikein.fa"
samtools faidx "$spikein_dir/fasta/spikein.fa"
cut -f1,2 "$spikein_dir/fasta/spikein.fa.fai" > "$spikein_dir/chrom_sizes/spikein"
{
    echo -e 'chrom\tstart\tend\tname\tscore\tstrand\tgene_id\ttranscript_id\tgene_name\ttranscript_name\tgene_type\ttranscript_type\tsource'
    awk 'BEGIN{OFS="\t";FS="\t"}{print $1,0,$2,$1,0,"+",$1,$1,$1,$1,"spikein","spikein","spikein"}' "$spikein_dir/fasta/spikein.fa.fai"
} > "$spikein_dir/transcript_table/spikein.txt"
bowtie2-build "$spikein_dir/fasta/spikein.fa" "$spikein_dir/index/bowtie2/spikein"

Update sequential mapping order

The default mapping order is set as rna_type variable in snakemake/default_config.yaml:

You can change the mapping order by add a rna_type variable in config/${dataset}.yaml. For example, add spike-in sequences as the first RNA type:

Add new reference sequence

If a new RNA type is added, you should also add a sequence file in FASTA format: ${genome_dir}/fasta/${rna_type}.fa. Then build a FASTA index (${genome_dir}/fasta/${rna_type}.fa.fai):

Then build a bowtie2 index (${genome_dir}/index/bowtie2/${rna_type}):

Quality control (before adapter removal)

Remove adapter

Start clean reads

Quality control (after adapter removal)

Mapping

Generate BigWig files

Call domains

Count matrix

Combine domains with small RNA

Last updated