Frequently Asked Questions
How to install and set up the environment
We have packaged everything into exSEEK docker image
How to use exSEEK
exSeek is an integrative tool for exRNA processing and feature selection. We use snakemake for parallel running and further integrate snakemake pipeline into one single command.
Details of preparing steps are described here. Basically you should complete the following steps before running the command:
Install exseek and requirements
Prepare genome and annotation
prepare input files in right file path
set up configuration
Then you can run the command, you can specify the module you want to run and dataset you provide.
What is Snakemake
The Snakemake workflow management system is a tool to create reproducible and scalable data analyses. We have hide the details of snakemake and you only need to run one single command. However you can customize some of the codes if you are familiar with snakemake.
How to set configurations in config file
There are many parameters to be specified. You should make a new copy of config file in config directory. For example you can nake one copy of scirep.yaml. Then rename the file to config/${dataset}.yaml.
Other parameters are defined in snakemake/default_config.yaml. You may also change parameters.
How to generate report
After running some modules, e.g., mapping, normalization and evaluation. You can open jupyter notebook files in notebooks file. The only thing to do is to fill in the dataset name and sequencing type.
For example:
Then you can get plots of your mapping, processing and feature selection details in one jupyter notebook.
Note: the notebook is based on exseek output style. If you process your data on your own without exseek and only need the jupyter to generate plots, you should change the codes for file paths in jupyter notebook to successfully generate plots.
When bugs appear:
The quickest way is to create a new issue
If you want us to add more functions in exseek, please create a new issue
Why did some jobs occasionally fail with no extra error message except 'CalledProcessError'?
The most possible cause is no available memory. You can confirm the problem by running the linux command dmesg
or open and examine the most recent message. Message like "Out of memory: Kill process ** or sacrifice child" clearly indicates that memory problem occurred. Some jobs (e.g. mapping using STAR, samtools sort, bedtools sort)requires large amount memory especially when the input number of reads is large. Try to reduce the number of parallels with the -j
option or set memory limit in config/cluster.yaml
for a particular job if you run the jobs on a computer cluster.
How to rerun downstream steps from a specific step in a pipeline
Sometimes we need to rerun a pipeline from a step, usually after changing the configuration file. Snakemake is not aware of changes in configuration file and we need to rerun the pipeline by ourselves. The --forcerun
option in snakemake allows rerunning a step and all steps that depend on the output files of the step. For example, to rerun the count_matrix
step, just run:
Last updated