Genome and Annotations
Annotation summary table
Type
Number of genes
Source
miRNA
1917
miRBase hairpin (Version 22)
piRNA
23431
piRNABank
lncRNA
15778
GENCODE V27 and mitranscriptome
rRNA
37
NCBI refSeq 109
mRNA
19836
GENCODE V27
snoRNA
943
GENCODE V27
snRNA
1900
GENCODE V27
srpRNA
680
GENCODE V27
tRNA
649
GENCODE V27
tucpRNA
3734
GENCODE V27
Y_RNA
756
GENCODE V27
circRNA
140527
circBase
repeats
-
UCSC Genome Browser (rmsk)
promoter
-
ChromHMM tracks from 9 cell lines from UCSC Genome Browser
enhancer
-
ChromHMM tracks from 9 cell lines from UCSC Genome Browser
Genome and annotation files
File
Description
fasta/genome.fa
genome sequence
fasta/circRNA.fa
junction sequence in circBase
fasta/rRNA.fa
rRNA sequences in NCBI RefSeq
fasta/miRNA.fa
miRNA hairpin (precursor) sequences in miRBase
fasta/piRNA.fa
piRNA sequences in piRNABank
fasta/${rna_type}.fa
longest isoform for each gene extracted from GENCODE annotations
gtf_by_biotype/${rna_type}.gtf
separate GTF files for each RNA type
gtf/gencode.gtf
GENCODE GTF file
gtf/mitranscriptome.gtf
Mitranscriptome GTF file
gtf/long_RNA.gtf
GTF file of Long RNA (GENCODE + Mitranscriptome - miRNA)
gtf/piRNABank.gtf
piRNA GTF file from piRNABank
gtf/gencode_tRNA.gtf
GTF file of tRNA from GENCODE
transcript_table/all.txt
table of transcript information (gene_id, transcript_id)
rsem_index/bowtie2/${rna_type}
RSEM index files for each RNA type (built using the longest transcripts)
rsem_index/bowtie2/${rna_type}.transcripts.fa
sequence for each RNA type (longest transcripts)
gtf_longest_transcript/${rna_type}.gtf
GTF files for the longest isoforms from GENCODE and Mitranscriptome
bed/*.bed
transcript in BED12 format extracted from GTF files in `gtf/*.gtf
index/bowtie2/${rna_type}
STAR index for transcripts
index/star/${rna_type}
STAR index for transcripts
long_index/star/
STAR index including splicing junctions of long RNA
Generate the genome and annotation files
Create genome directory
Chromosome ID conversion table
Column 1: UCSC chromosome ID
Column 2: RefSeq chromosome ID
Download Gene annotation (NCBI)
Download chain files for CrossMap
Genome assembly (UCSC hg38)
ENCODE annotations
Mitranscriptome
Extract lncRNA and TUCP RNA to separate GTF files:
NONCODE
lncRNAs identified in HCC (Nature communications 2017)
Merge lncRNA (GENCODE and Mitranscriptome)
piRBase (v1.0)
piRBase (v2.0)
Long RNA (GENCODE + Mitranscriptome - miRNA)
gene_length/long_RNA
Tab-deliminated text file
First row: header
Column 1 (gene): gene_id
Column 2 (mean): mean length of isoforms
Column 3 (median): median length of isoforms
Column 4 (longest_isoform): length of the longest isoform
Column 5 (merged): merged length of isoforms
piRNABank (NCBI36)
miRBase (Version 22)
Spike-in
UniVec
Intron
Promoter/enhancer from ChromHMM (hg19)
Repeats
UCSC GenomeBrowser -> Tools -> Table Browser
assembly: GRCh38/hg38
group: repeats
track: RepeatMasker
table: rmsk
Dowload to: genome/hg38/source/rmsk.bed.gz
circRNA database (circBase)
Create pseudo-genome for IGV
Merge transcript table
Last updated
Was this helpful?