Introduction
The PacBio Sequel system: Our facility equips with the third generation sequencer PacBio Sequel, utilizing Single Molecule Real-Time sequencing (SMRT) technology launched in September 2015 by Pacific Biosciences in California, United States. The PacBio Sequel signatures long read lengths. Such capability assists genome annotation and de novo assembly, especially with repetitive elements, including structural variations, as well as finishing draft genomes by filling in gaps and minimizing alignment errors, which is a challenge with second generation sequencers. Long read lengths also enable sequencing of full length transcripts and increased resolution of heterozygosity in diploid genomes via haplotype phasing. Sequencing can be performed on low DNA content samples and with the option of multiplexing to save both time and reagent cost.
Molecular features of the PacBio Sequel system: On a molecular level, a single DNA polyermase-DNA template complex is immobilized at the bottom of a Zero Mode Waveguide (ZMW). Differentially fluorescent labeled nucleotides are introduced into the ZMW and incorporated by the DNA polymerase. Fluorescent signals are captured only at the bottom of the ZMW with a depth of 20 to 30 nm and a detection volume of 20 zeptoliter, and upon nucleotide incorporation, the phosphor-linked fluorescent tag is cleaved. Such technicalities improve the signal to noise ratio. Each SMRT cell (aka sequencing chamber) contains one million ZMW to allow running of multiple sequencing reactions at once. There is no need for amplification, and thus, amplification bias is minimized. Sequencing is based on circular consensus reads for high and uniform coverage.
DNA Input Quality Control: Our facility also includes other equipment to ensure the quality of DNA prior to sequencing. These include: Covaris M220 focused-ultrasonicator, Agilent 2100 Bioanalyzer, Thermo Fisher Scientific Qubit 3.0 Fluorometer and NanoDrop 2000c, Sage Science BluePippin system, Applied Biosystems StepOnePlus Real-Time PCR system and Veriti Thermal Cycler, Next Advance Bullet Blender, and Hoefer and Baygene gel electrophoresis systems.
Applications:
I. Complex Populations
Metagenomics is an emerging field, featuring the ability to sequence unculturable microbes and discovering new microbial species. The long read capability of Sequel allows sequencing of the full length 16S rRNA gene or other functional genes for enhanced bacterial phylogenetic taxonomy classification. Microbiome can be obtained from different environments, such as environmental samples, including water and soil, and biological samples, including skin, lung aspirates and feces. The microbiome can be compared between the healthy and diseased states. Whole genome shotgun metagenomics enable reconstruction and annotation of genomes in a complex population. Another widely explored area is determining the profile of antibiotic resistance genes in various niches.
Genomic studies of viral populations have explored their evolution and diversification, including HIV, HBV and influenza viruses. Identification of novel mutations allows for investigation of genetic drift, and identification of drug resistant variants allows for more efficient vaccine design targeting evolving strains.
Sequencing for somatic variations in patients unresponsive to cancer therapy can guide future drug development. The full length BCR-ABL1 fusion gene can be sequenced in a single read for patients with chronic myelogenous leukemia. Human leukocyte antigen (HLA) typing for full length HLA genes can determine cancer associated somatic mutations.
II. Whole Genome Sequencing
Long reads circumvent the assembly problem of repetitive elements and structural variants. With multiple copies of repetitive elements, the number of such elements is collapsed after assembly of short reads. Structural variants include inversions, translocations, segmental duplications, tandem repeats, chromosomal rearrangements, large insertions and deletions, or a complex combination of the above events. Structural variants are typically more than or equal to 50 bp long and comprise up to 13% of the human genome. They tend to reside in repetitive elements that are missed by short read technologies. Long reads, thus, is capable of ensuring accurate genome annotation, de novo assembly and finishing of draft genomes by filling in gaps and minimizing alignment errors, which is a challenge with second generation sequencers.
Applications of whole genome sequencing can be separated into categories of microbes, humans, plants and animals.
1) Microbes: Identifying genes that can be potential natural drug products is an area of active research called genome mining. Mutations that render bacterial resistance to certain types of antibiotics can also be identified. Sequencing genomes of microbes permit the epidemiological control of spreading disease infections. There are a number of studies that utilize long reads to sequence large DNA viruses (e.g. baculoviruses, polydnaviruses) to facilitate assembly. Sequencing whole genomes allows for comparative genomics.
2) Humans: The human reference genome contains gaps and missing sequences. Long read sequencing resolves structural variation and haplotype phasing. There is better resolution of heterozygosity in diploid genomes.
3) Plants: DNA sequencing can reveal genes associated with drought tolerance in crops, as well as photosynthetic capabilities, cold tolerance, and disease resistance and permits genetic engineering in order to optimize crop yield in different environments, especially with the increasing human population and limited land resources. Sequencing enables molecular breeding of medicinal plants through artificial cultivation in phytomedicine, such as Traditional Chinese Medicine, and development of transgenic medicinal strains to improve clinical efficacy. The use of genomes to establish the phylogenetic relationship of medicinal plants is called pharmaphylogenomics. Exploring chloroplast genome allows for optimization of photosynthesis and robust growth of medicinal plants.
4) Animals: Long sequencing reads allow scientists to investigate the reversal of Y chromosomes to autosomes despite the high repeat content in Y chromosomes. Long reads can complete draft genomes, allow chromosomal level of genome assembly and generate high quality reference genomes for complex mammalian genomes.
III. Targeted Sequencing
Targeted sequencing reduces the cost and time associated with whole genome sequencing. Targeted areas could be whole exome, organelle specific (e.g. mitochondria, chloroplast) or gene(s) specific. Whole human exome represents only 1% of the human genome. Mitochondrial DNA is repetitive and long reads resolve the assembly problem. Targeted sequencing can be achieved by probe capturing or PCR amplification of the target gene(s). The probe capturing method can capture exomes, exomes plus untranslated regions, exomes plus 5’ and 3’ flanking regions, as wells as certain groups of genes, such as disease-specific genes. The groups of genes to be captured can be custom designed as well. The PCR amplification method requires primers that anneal to specific gene(s) of interest. Targeted sequencing has been implicated in the biomedical field and in animal and microbial research.
1) Biomedical: HLA typing is a common technique, and sequencing full length HLA genes provides better insight into immune disorders (e.g. immune responses to infection, transplant rejection, autoimmune diseases, adverse drug reaction and cancer development). HLA typing is used as a preliminary screening for potential organ donors, individuals with potential susceptibility to adverse drug responses and risk prediction for complex diseases. The Sequel can generate full length alleles of majority of the HLA class I and II genes. It also allows complete phasing of HLA genes to resolve phasing ambiguities from short reads. HLA typing for adverse drug reaction prior to drug prescription is popular in the area of pharmacogenomics. Sequencing the Fc?R region of the immune receptor can explore its genetic variation, leading to variable control of humoral and innate immune responses in different individuals.
2) Animals: Sequencing the repeat rich telomeres allows investigation of DNA damage responses and cellular senescence and apoptosis. Sequences of the animal major histocompatibility complex (MHC) elucidate the evolution history.
3) Microbes: Via the PCR amplification method, the full length 16S rRNA gene and other functional genes can reveal the phylogenetic taxonomy classification of bacteria. It is possible to amplify, by PCR, the repetitive regions only and finish draft genomes. In multi-locus sequencing, multiple targets are amplified by PCR and then sequenced. Virus integration sites in chromosomes upon virus infection can also be elucidated. Some microbes in crops produce insecticidal proteins. Investigating on and engineering such proteins increases agricultural productivity by enhancing their specificity and potency.
IV. Iso-Seq (RNA sequencing)
The long read capability of Sequel allows sequencing of full length continuous cDNA molecules. This facilitates a more complete understanding of complex eukaryotic transcriptomes and prevents short read misassembly or incomplete capture of the full diversity of isoforms from genes of interest. Applications have been implemented in different model organisms, including humans, animals, plants, virus and fungi.
1) Humans: Full length RNA sequencing enables partitioning of the transcript isoform reads into the parental alleles (aka haplotype phasing), determining alternatively spliced or novel transcripts in diseased states, and detecting fusion genes in their isoforms.
2) Animals: Full length sequences of circular RNAs, found mostly in the brain, can be determined, in addition to other transcript isoforms.
3) Plants: Transcriptomes of wheat, grain and rice are common areas of study.
4) Virus: Novel transcripts have been identified in pseudorabies virus.
5) Fungi: Polycistronic transcripts can be identified in a single sequence.
Booking:
Please visit our University Research Facility Management System (URFMS) website for registration, training request and booking arrangement.
Contact Information:
Location: ZS1104, Block Z, The Hong Kong Polytechnic University
Equipment-in-change: Dr. Sirius TSE (Office: U514)
Email: sirius.tse@polyu.edu.hk
Telephone: 3400 8878