Contents [Hide]
1 Background Introduction
With the development of high-throughput sequencing technology, the limitations of short-read sequencing are in complex regions, such as high repeats and high GC. While there are some problems of short-read sequencing, except for short read lengths, as well as it can not span high repeats and low complexity regions. There also are certain limitations in the detection of large structural variants (SV). The rapid development of long-read sequencing in recent years has solved these problems at this stage. Long-read sequencing uses modern optics, polymers, nanotechnology and other means to distinguish the difference of base signals, so as to directly read sequence information.
2 Whole Genome Sequencing Technolog
CycloneSEQ™ whole-genome resequencing products, using CycloneSEQ™ for long-read sequencing, which can effectively detect large structural variations such as insertions, deletions, inversions, and duplications.
2.1 CycloneSEQ™ Sequencing
Figure1 shows the CycloneSEQ™ library construction and sequencing process. The library construction kits provided with the CycloneSEQ™ platform can be used for library construction. The specific steps are as follows:
1 Size-Selection;
2 Library construction:
1) DNA Damage repair , End Repair, A-Tailing and clean-up;
2) Adapter Ligation;
3) Magnetic Beads Purication and Qubit Quantification
3 Priming and loading the flow cell;
4 On-board sequencing of the library.
The sequencing process on CycloneSEQ™ is shown as Figure2:
CycloneSEQ™ is a nanopore sequencing technology based on the principle of nanopore sensors. This technology utilizes pore proteins spanning a nanoscale biomimetic membrane as the core sensor. During the sequencing process, voltage is applied on both sides of the biomimetic membrane. The double-stranded DNA/RNA bound to the motor protein is captured by the nanopore protein and unwound under the action of the motor protein. Driven by the electric field force, nucleic acids pass through the nanopore as continuous single-stranded molecules at a speed controlled by the motor protein. When nucleic acids pass through the pore, the current changes, and the current changes caused by different base sequences are different. A base calling algorithm is established through a machine learning model, and the sequence of the nucleic acid molecules passing through the hole is inferred from the current changes. Thus, real-time and accurate sequencing of single-stranded nucleic acid molecules can be achieved.
3 Overview of Bioinformatics Analysis
Figure3 shows the pipeline of Cyclone sequencing data filter bioinformatics analysis. For the download data of Cyclone long-read sequencing, first porechope[1] was used to filter adapter sequences, then seqtk[2] software was used to filter sequences below 100bp and NanoPlot[3] was used to do statistics with the filtered data.The software and parameters involved in each step are described as follows:
3.1 Data Filtering
On the basis of passing the filter condition (fastq_pass), porechope\[1\] software was used to remove the ligation, and seqtk\[2\] was used to filter the sequences less than 100 bp, and finally NanoPlot\[3\] software was used to stat the result.The following are the command line parameters:
porechop -i fastq_pass.raw. FASTQ .gz -t 1 -o fastq_pass. FASTQ .gz
FASTQ
The file storing the sequence and the corresponding quality value, the sequence and the quality value are expressed in ASCII code
ASCII
ASCII, the American Standard Information Interchange Code, is a computer coding system based on the Latin alphabet and used to display modern English and other Western European languages.
FASTQ
The file storing the sequence and the corresponding quality value, the sequence and the quality value are expressed in ASCII code
ASCII
ASCII, the American Standard Information Interchange Code, is a computer coding system based on the Latin alphabet and used to display modern English and other Western European languages.
seqtk seq -L 100 fastq_pass. FASTQ .gz |gzip -c - > input. FASTQ .gz
FASTQ
The file storing the sequence and the corresponding quality value, the sequence and the quality value are expressed in ASCII code
ASCII
ASCII, the American Standard Information Interchange Code, is a computer coding system based on the Latin alphabet and used to display modern English and other Western European languages.
FASTQ
The file storing the sequence and the corresponding quality value, the sequence and the quality value are expressed in ASCII code
ASCII
ASCII, the American Standard Information Interchange Code, is a computer coding system based on the Latin alphabet and used to display modern English and other Western European languages.
NanoPlot --color red --N50 -- FASTQ input. FASTQ .gz --outdir ./
FASTQ
The file storing the sequence and the corresponding quality value, the sequence and the quality value are expressed in ASCII code
ASCII
ASCII, the American Standard Information Interchange Code, is a computer coding system based on the Latin alphabet and used to display modern English and other Western European languages.
FASTQ
The file storing the sequence and the corresponding quality value, the sequence and the quality value are expressed in ASCII code
ASCII
ASCII, the American Standard Information Interchange Code, is a computer coding system based on the Latin alphabet and used to display modern English and other Western European languages.


