squigualiser

Pipeline basic

Here we will go through each step from the raw signals file (slow5) to generating plots.

For this we will use r9.4_450bps.nucleotide DNA dataset. The dataset is at test/data/raw/pipelines/pipeline_0/r9.4_450bps.nucleotide. Note that the script must be executed inside the virtual environment where squigualiser is installed.

Pipeline variables

The following variables must be changed before running the bash script.

Variable name	value set for this pipeline	Description
SQUIGULATOR	depends on user computer	path to squigulator (minimum version v0.1.0)
SAMTOOLS	depends on user computer	path to samtools (minimum version v1.13)
MINIMAP2	depends on user computer	path to minimap2 (minimum version v2.24)
F5C	depends on user computer	path to f5c (minimum version v1.3)
BGZIP	depends on user computer	path to bgzip (minimum version v1.13)
TABIX	depends on user computer	path to tabix (minimum version v1.13)
GUPPY	depends on user computer	path to dir containing guppy executables (minimum version v6.3.7)
BUTTERY_EEL_ENV_PATH	depends on user computer	path to python env where buttery-eel is installed (minimum version v0.2.2)
REFERENCE	depends on user computer	path to the reference genome file (please use a human genome if you use the test datasets)
MODEL_TO_USE	dna_r9.4.1_450bps_fast_prom.cfg	this has to be set to the name of the model used to basecall
CHUNK_SIZE	500	(optional) This is used in basecalling to limit GPU memory.
SIMULATING_PROFILE	dna-r9-prom	signal profile used by squigulator (-x argument)
RE_REFORM_K	3	`kmer_length` value used in the second run of `reform`. For new data this value can only be set after running `calculate_offsets` step (see pipeline steps) More details can be found at profiles
RE_REFORM_M	2	`sig_move_offset` value used in the second run of `reform`. For new data this value can only be set after running `calculate_offsets` step (see pipeline steps) More details can be found at profiles
PROFILE_TO_DETERMINE_BASE_SHIFT	kmer_model_dna_r9.4.1_450bps_6_mer	`squigualiser` uses this value to determine the appropriate `base_shift`. More details can be found at profiles
SIGNAL_FILE	reads.blow5	SLOW5/BLOW5 signal file
READ_ID	8e1a33c4-af69-471c-a115-6428c8bf63df	An arbitrary read_id from the SIGNAL_FILE to perform `calculate_offsets_read_id` step (see pipeline steps)
READ_REGION	${READ_ID}:1-500	The region to simulate and plot in the selected read.
REF_REGION	chr16:46,389,459-46,390,388	The region in the reference to simulate and plot. This region cannot be determined without running `minimap2` step. User can load the minimap2 aligned sam file to IGV and select a region.
SIM_REGION	sim_ref:1-500	(optional) The region to plot in simulated read and the reference.
SIG_SCALE	znorm	(optional) `squigualiser` scales the signal. More information at commands

Pipeline steps

Each step listed in the bash script run.sh is discussed below.

Step	Description
basecall_eel	`Buttery-eel` is used to basecall and create moves.sam. Sam file is then converted to bam and fastq for later steps.
basecall_fast5	if `Buttery-eel` is giving errors use guppy directly to do the basecalling.
reform_move_table	run `reform` with `-k 1` and `-m 0`. Optionally, to skip steps 4, 5, and 6 provide the correct –profile argument. For this pipeline lets not skip the steps.
calculate_offsets	use `reform` output to determine best k and m values.
calculate_offsets_read_id	Cretae a pdf file with the distributions to visualise the separation (useful for the user to further confirm if the determined k and m values are optimal).
re_reform	Run `reform` again using the determined k and m values.
simulate_read_signal	Simulate a read using `squigulator`.
plot_sim_and_real_signal	Plot both simulated and real signals in two separate tracks. Two pileup commands (one for plotting simulated and one for real signal) were added to a file. Note `--profile` argument is provided for the simulated signal plot. The profile name depends on what kmer model was used to simulate the signal. For the real signal plot `--profile` argument is not necessary as we have ran reform again with new k and m values to make the base shift 0. Then the file is given as the input to plot_tracks command.
minimap2_align	Align the basecalled reads to the reference.
realign	Run `realign` tool to re-align cigar mapping to moves.
simulate_ref_signal	simulate part of the reference region using `squigulator`.
plot_realign_and_sim	Plot both simulated reference and real signals mapped to the reference (using realign) in two separate tracks. Two pileup commands (one for plotting simulated and one for real signals) were added to a file. Note `--profile` argument is provided for the simulated signal plot. The profile name depends on what kmer model was used to simulate the signal. For the real signal plot `--profile` argument is not necessary as we have ran reform again with new k and m values to make the base shift 0. Then the file is given as the input to plot_tracks command.
f5c_eventalign	Peform `f5c eventalign` to get the eventalignment file.
plot_eventalign_and_sim	Plot both simulated reference and real signals mapped to the reference in two separate tracks. Two pileup commands (one for plotting simulated and one for real signals) were added to a file. Note `--profile` argument is provided for the simulated signal plot. The profile name depends on what kmer model was used to simulate the signal. In this case, also for the real signal plot `--profile` argument is necessary as we `f5c` used a kmer model. Since the same kmer model was used in `squigulator` and `f5c`, profile name the same. Then the file is given as the input to plot_tracks command.
plot_eventalign_forward_reverse	So far all the plots only used forward mapped reads as it is the default. However, in this step forward and reverse mapped reads are plottted using the f5c eventalignment output. Note that `--plot_reverse` argument is provided in the second command. As usual, profile name is also provided.
plot_eventalign_forward_reverse	Similar to step `plot_eventalign_forward_reverse` but this time using `realign` data instead of `f5c` eventalignment. Hence, profile name is not required.

You can use this basic pipeline script as a start and do the necessary changes to support your analysis.

A similar pipeline is available for dna_r10.4.1_e8.2_400bps data at test/data/raw/pipelines/pipeline_0/dna_r10.4.1_e8.2_400bps. A similar pipeline is available for rna_r9.4.1_70bps data at test/data/raw/pipelines/pipeline_0/rna_r9.4.1_70bps.

This site is open source. Improve this page.