Here we will go through each step from the raw signals file (slow5) to generating plots.
For this we will use r9.4_450bps.nucleotide
DNA dataset. The dataset is at test/data/raw/pipelines/pipeline_0/r9.4_450bps.nucleotide
.
Note that the script must be executed inside the virtual environment where squigualiser is installed.
The following variables must be changed before running the bash script.
Variable name | value set for this pipeline | Description |
---|---|---|
SQUIGULATOR | depends on user computer | path to squigulator (minimum version v0.1.0) |
SAMTOOLS | depends on user computer | path to samtools (minimum version v1.13) |
MINIMAP2 | depends on user computer | path to minimap2 (minimum version v2.24) |
F5C | depends on user computer | path to f5c (minimum version v1.3) |
BGZIP | depends on user computer | path to bgzip (minimum version v1.13) |
TABIX | depends on user computer | path to tabix (minimum version v1.13) |
GUPPY | depends on user computer | path to dir containing guppy executables (minimum version v6.3.7) |
BUTTERY_EEL_ENV_PATH | depends on user computer | path to python env where buttery-eel is installed (minimum version v0.2.2) |
REFERENCE | depends on user computer | path to the reference genome file (please use a human genome if you use the test datasets) |
MODEL_TO_USE | dna_r9.4.1_450bps_fast_prom.cfg | this has to be set to the name of the model used to basecall |
CHUNK_SIZE | 500 | (optional) This is used in basecalling to limit GPU memory. |
SIMULATING_PROFILE | dna-r9-prom | signal profile used by squigulator (-x argument) |
RE_REFORM_K | 3 | kmer_length value used in the second run of reform . For new data this value can only be set after running calculate_offsets step (see pipeline steps) More details can be found at profiles |
RE_REFORM_M | 2 | sig_move_offset value used in the second run of reform . For new data this value can only be set after running calculate_offsets step (see pipeline steps) More details can be found at profiles |
PROFILE_TO_DETERMINE_BASE_SHIFT | kmer_model_dna_r9.4.1_450bps_6_mer | squigualiser uses this value to determine the appropriate base_shift . More details can be found at profiles |
SIGNAL_FILE | reads.blow5 | SLOW5/BLOW5 signal file |
READ_ID | 8e1a33c4-af69-471c-a115-6428c8bf63df | An arbitrary read_id from the SIGNAL_FILE to perform calculate_offsets_read_id step (see pipeline steps) |
READ_REGION | ${READ_ID}:1-500 | The region to simulate and plot in the selected read. |
REF_REGION | chr16:46,389,459-46,390,388 | The region in the reference to simulate and plot. This region cannot be determined without running minimap2 step. User can load the minimap2 aligned sam file to IGV and select a region. |
SIM_REGION | sim_ref:1-500 | (optional) The region to plot in simulated read and the reference. |
SIG_SCALE | znorm | (optional) squigualiser scales the signal. More information at commands |
Each step listed in the bash script run.sh
is discussed below.
Step | Description |
---|---|
basecall_eel | Buttery-eel is used to basecall and create moves.sam. Sam file is then converted to bam and fastq for later steps. |
basecall_fast5 | if Buttery-eel is giving errors use guppy directly to do the basecalling. |
reform_move_table | run reform with -k 1 and -m 0 . Optionally, to skip steps 4, 5, and 6 provide the correct –profile argument. For this pipeline lets not skip the steps. |
calculate_offsets | use reform output to determine best k and m values. |
calculate_offsets_read_id | Cretae a pdf file with the distributions to visualise the separation (useful for the user to further confirm if the determined k and m values are optimal). |
re_reform | Run reform again using the determined k and m values. |
simulate_read_signal | Simulate a read using squigulator . |
plot_sim_and_real_signal | Plot both simulated and real signals in two separate tracks. Two pileup commands (one for plotting simulated and one for real signal) were added to a file. Note --profile argument is provided for the simulated signal plot. The profile name depends on what kmer model was used to simulate the signal. For the real signal plot --profile argument is not necessary as we have ran reform again with new k and m values to make the base shift 0. Then the file is given as the input to plot_tracks command. |
minimap2_align | Align the basecalled reads to the reference. |
realign | Run realign tool to re-align cigar mapping to moves. |
simulate_ref_signal | simulate part of the reference region using squigulator . |
plot_realign_and_sim | Plot both simulated reference and real signals mapped to the reference (using realign) in two separate tracks. Two pileup commands (one for plotting simulated and one for real signals) were added to a file. Note --profile argument is provided for the simulated signal plot. The profile name depends on what kmer model was used to simulate the signal. For the real signal plot --profile argument is not necessary as we have ran reform again with new k and m values to make the base shift 0. Then the file is given as the input to plot_tracks command. |
f5c_eventalign | Peform f5c eventalign to get the eventalignment file. |
plot_eventalign_and_sim | Plot both simulated reference and real signals mapped to the reference in two separate tracks. Two pileup commands (one for plotting simulated and one for real signals) were added to a file. Note --profile argument is provided for the simulated signal plot. The profile name depends on what kmer model was used to simulate the signal. In this case, also for the real signal plot --profile argument is necessary as we f5c used a kmer model. Since the same kmer model was used in squigulator and f5c , profile name the same. Then the file is given as the input to plot_tracks command. |
plot_eventalign_forward_reverse | So far all the plots only used forward mapped reads as it is the default. However, in this step forward and reverse mapped reads are plottted using the f5c eventalignment output. Note that --plot_reverse argument is provided in the second command. As usual, profile name is also provided. |
plot_eventalign_forward_reverse | Similar to step plot_eventalign_forward_reverse but this time using realign data instead of f5c eventalignment. Hence, profile name is not required. |
You can use this basic pipeline script as a start and do the necessary changes to support your analysis.
A similar pipeline is available for dna_r10.4.1_e8.2_400bps
data at test/data/raw/pipelines/pipeline_0/dna_r10.4.1_e8.2_400bps
.
A similar pipeline is available for rna_r9.4.1_70bps
data at test/data/raw/pipelines/pipeline_0/rna_r9.4.1_70bps
.