The Evolution of Rice Genome Sequencing Technologies

The advancement of DNA sequencing technologies has catalyzed transformative advancements in genomic research. First-generation platforms pioneered nucleic acid decoding with ultra-high precision, yet remained constrained by low efficiency and prohibitive costs. Subsequently, second-generation systems revolutionized this landscape through massively parallel processing capabilities, enabling large-scale germplasm characterization and molecular marker discovery. The latest third-generation technological breakthroughs have overcome critical bottlenecks by achieving single-molecule resolution. This review systematically examines the technical evolution of rice (Oryza sativa) genome sequencing from three perspectives: technological iterations, practical applications of research outcomes, and future developmental trajectories.

The Significance of Rice Genome Research

The rice genus encompasses Asian rice, African rice, and 20 wild rice species. Wild rice, a wild relative of cultivated varieties such as Asian rice, serves as a vital germplasm resource for modern rice breeding. With the global population growing rapidly, the development of high-quality rice varieties becomes increasingly urgent. However, research and resource exploration of wild rice remain relatively limited. As sequencing technologies advance, the assembly of high-quality genomes and the construction of pan-genomes offer powerful tools for fully exploring genetic variation resources. Additionally, cultivated rice originated from wild rice through a series of domestication processes. Constructing pan-genomes can reveal the genomic evolution and domestication patterns of rice species, addressing challenges in traditional rice breeding.

The rice genome (430 Mb) is the smallest among cereal crops, easy to genetically manipulate, and collinear with other cereal crops. It has become a model plant for genetic and genomic research. By 2002, the determination of the whole genome working framework for two subspecies of indica and japonica rice, as well as the determination of the full-length genome sequence for japonica rice, had been completed successively. This is not only beneficial for exploring the functions of rice genes, but also for elucidating larger and more complex cereal genome research. The successful research on rice genome sequencing will contribute to ensuring food security for all humanity.

Research Progress on Rice Towards Green Super Rice Development (ZhangCA1 et al.,2024) Progression of rice cultivation towards Green Super Rice (ZhangCA1 et al.,2024)

Technological Evolution: From Single Reference Genome to Pan-Genome

The genomic revolution has undergone three transformative waves since the Human Genome Project's inception in the 1990s. This evolutionary trajectory began with foundational Sanger sequencing methodologies utilizing electrophoretic separation, progressed through Illumina's massively parallel synthesis paradigm and China's breakthrough achievements in high-throughput platforms, culminating in contemporary long-read sequencing architectures. The economic implications are staggering: genome decoding costs have plummeted from billions to double-digit dollar figures. Such exponential advancements now empower researchers to achieve chromosomal-level assemblies across diverse taxa - a feat once confined to model organisms.

Limitations of Traditional Sequencing

In 2002, the International Rice Genome Sequencing Project (IRGSP) was launched to sequence the rice genome, a task that took a decade to complete. In 2012, the rice genome was fully sequenced. This achievement has not only boosted rice breeding research but also laid the groundwork for the genome sequencing of other grains. However, traditional sequencing methods have limitations such as low resolution, high costs, and inadequate coverage of genetic diversity. The integration of modern multi-omics technologies is gradually overcoming these challenges. It provides a more comprehensive genomic information base for rice genetic improvement and is moving us closer to achieving precision-designed, sustainable agricultural breeding.

First-Generation Sequencing

The dawn of Oryza genomics emerged in 2002 through an international consortium led by American, Japanese, and Chinese scientists, marking a watershed moment in cereal crop sequencing. This pioneering endeavor primarily utilized Sanger sequencing - the dideoxy chain termination technique - whose molecular mechanism depends on ddNTP-mediated chain termination during DNA polymerization. Despite its labor-intensive nature and prohibitive costs, this foundational method delivered unparalleled accuracy that proved indispensable for critical breakthroughs, such as the precise mapping of rice blast resistance genes that revolutionized phytopathology research.

Complementing this approach, Maxam-Gilbert chemical degradation served as the alternative pillar of first-generation sequencing. This methodology employed nucleotide-specific cleavage followed by electrophoretic separation, demonstrating particular efficacy in handling large DNA fragments. While operationally complex compared to Sanger's enzymatic process, its capacity to resolve intricate gene architectures and pinpoint mutations established unique value in functional genomics investigations.

Although first-generation platforms achieved read lengths exceeding 1 kb with base-level precision, their technical constraints - including low throughput and inability to decipher complex genomic regions - catalyzed the sequencing revolution's next phase. The subsequent emergence of second-generation sequencing addressed these limitations through massively parallel processing, fundamentally transforming the economics and scalability of genome analysis.

Typical RePS Assembly, Alignment of 93-1 Sequ1ences with gla BAC Sequences (Yu et al.,2002) Typical RePS assembly, with 93-11 (indica) contigs aligned to finished BAC sequences fromGLA (indica) (Yu et al.,2002)

Second-Generation Sequencing

The launch of Roche's 454 sequencing platform in 2005 marked the advent of second-generation sequencing technology. This platform laid the foundation for second-generation sequencing and represented a milestone in the evolution of sequencing technologies.

The Roche 454 sequencing technology initially involves shearing genomic DNA or BAC into small fragments of 300-800 base pairs, followed by ligating these fragments to adaptors containing specific sequences at both 3' and 5' ends. Subsequently, all DNA fragments undergo parallel amplification via emulsion PCR (emPCR). After amplification completion, they are loaded onto Pico Titer Plates. The system provides three distinct bioinformatics tools to fulfill sequencing data analysis requirements for diverse applications.

In this investigation of genetic diversity among rice germplasm resources, the Roche 454 sequencing platform was employed to sequence multiple rice varieties. Through meticulous analysis of sequencing data, researchers successfully identified numerous single nucleotide polymorphism (SNP) loci, which constitute valuable molecular marker resources for rice genetic breeding research.

Addressing the limitations of first-generation sequencing mentioned earlier, the 454 sequencing technology successfully circumvented these drawbacks. Not only did it retain the high accuracy characteristic of first-generation sequencing, but it also revolutionized the field by achieving high throughput (average throughput up to 0.5G), low cost, and rapid turnaround time. However, since 454 sequencing requires result assembly, the Sanger sequencing method remains more suitable for sequencing nucleotide sequences with structural repeats or repetitive regions. For example, 454 sequencing is ill-suited for sequencing satellite DNA or telomeric DNA. Additionally, its "sequencing-by-synthesis" approach results in extremely high sensitivity.

During the rapid development of gene chips, Illumina's Solexa sequencing technology emerged and was quickly adopted for sequencing plants, animals, and microorganisms. Like 454 sequencing, it employs a sequencing-by-synthesis approach. Its advantages include eliminating the need for probe synthesis or prior knowledge of a model species' genome sequence, thereby streamlining workflows. Furthermore, this technology utilizes integrated chips to anchor DNA molecules to specific primers during preparation, enabling their specific binding to the chip and yielding "monoclonal" DNA clusters. Consequently, it achieves high throughput (average throughput up to 30G), high sensitivity, and high accuracy [5]. With significantly higher throughput than 454 sequencing, Solexa became the backbone of second-generation sequencing technologies.

Solid (Supported Oligo Ligation Detection) sequencing, developed by ABI, employs a fundamentally different approach from the two aforementioned technologies. Its core distinction lies in relying on ligation rather than PCR amplification. Target sequences are determined through multiple rounds of sequencing using octamer fluorescent probes hybridized to the DNA. By eliminating the "synthesis" step, this method effectively reduces mismatches during base-pairing, thereby enhancing sequencing accuracy.

Second-generation sequencing technology, with its mature framework, high throughput, high sensitivity, high resolution, and low cost, has become widely accessible to laboratories and has gained broad adoption. As a result, it is now extensively applied to RNA-Seq workflows and research.

Third-Generation Sequencing

The genomic revolution has entered a transformative phase with the operational maturation of long-read sequencing platforms. Industry-leading systems like PacBio's Single Molecule Real-Time (SMRT) technology and Oxford Nanopore's protein pore arrays exemplify this technological progression, though ongoing refinement remains imperative. SMRT sequencing leverages zero-mode waveguide (ZMW) nanostructures to generate reads exceeding 64 kb - orders of magnitude beyond short-read platforms. Despite suboptimal base-calling accuracy (≈85%) necessitating hybrid sequencing approaches and advanced computational correction, these limitations are counterbalanced by unprecedented genomic context capture. Concurrently, visionary initiatives like Visige's "Three Ones" paradigm (1 genome/day at $1,000 cost) foreshadow imminent breakthroughs in sequencing economics, anticipating near-future platforms with enhanced speed, precision, and cost-efficiency. Collectively, these iterative technological waves have revolutionized genomic investigations since 2010, catalyzing landmark discoveries across eukaryotes.

In Oryza genomics, long-read sequencing's capacity to resolve architectural complexity has proven transformative. By enabling precise characterization of repetitive elements and structural polymorphisms (e.g., gene presence-absence variations, PAVs), these platforms overcome critical limitations of short-read mapping artifacts, establishing new standards for variation detection fidelity. A landmark investigation sequencing 111 Oryza accessions (105 cultivars, 6 wild relatives) exemplifies this paradigm shift. The resultant species-level pan-genome expanded known genomic diversity by 879 Mb through novel sequence integration, notably wild rice-derived Gypsy retrotransposons, while annotating 19,319 previously undocumented coding sequences across 2,132 new gene families. Crucially, this resource filled reference genome gaps for nine key rice lineages, including five telomere-to-telomere assemblies.

Comparative analysis revealed short-read platforms' propensity for false-positive PAV calls in repeat-rich regions (DNA transposons), whereas long-read data demonstrated superior performance in resolving LINE and Helitron elements. The study's integration of 14,471 trait-associated PAVs with agronomic phenotypes uncovered functional correlations - for instance, the LOC_Os01g27930 retrotransposon gene's presence/absence status inversely modulates grain dimensional ratios. This methodological framework not only deciphers rice genomic architecture but also establishes an actionable blueprint for precision breeding through structural variation harnessing.

Genomic Characteristics of the Rice Pan-genome from 111 Rice Materials (Zhang et al.,2022) Genomic features of the rice pan-genome derived from 111 rice accessions.(Zhang et al.,2022)

Services you may interested in

Rice Genome Sequencing

Sanger Sequencing Services

Agricultural NGS Services

Long-read Sequencing

Learn More

From Sanger to Third-generation: Sequencing Technology's Agricultural Applications

Breeding Tools: Common Database for Rice Research

Comprehensive Overview of Agricultural Genomic Databases

The Core Achievement of Rice Genome Sequencing

Rice functional genomics research, as the name implies, aims to parse the biological functions of rice genes based on known genome sequences. Its key task is to leverage genomic data alongside novel experimental approaches for large-scale genome - wide gene function studies. This work seeks to uncover the gene expression regulation mechanisms governing rice growth, development, and environmental responses.

Gene Functional Annotation

A groundbreaking Oryza pan-genomic architecture was established through integrative analysis of PacBio long-read sequencing and transcriptomic profiles from 33 phylogenetically diverse accessions spanning Asian and African ecotypes. This graph-based framework, the first of its kind in cereals, unveiled 66,636 coding sequences with 69.4% (46,262 genes) demonstrating presence-absence polymorphisms evolutionarily linked to domestication processes. Mechanistic dissection revealed 38.34% of cataloged loci (25,549 genes) exhibit dosage-dependent expression modulation, exemplified by OsMADS18 copy number amplification enhancing transcriptional output. Spatial genomic analysis identified 140 selection hotspots co-localizing with stress-responsive quantitative trait nucleotides, including known blast resistance determinants.

The platform's non-linear representation overcomes reference bias in multi-copy gene regulation studies, validated through transgenic manipulation of candidate loci influencing yield-related phenotypes. Complementing these discoveries, the researchers launched the RiceRC ecosystem - an open-access portal integrating genome visualization tools with functional annotation modules, creating an unprecedented resource for precision breeding and evolutionary genomics exploration.

Impact of SV on Genes Shapes Environmental Adaptation and Domestication (Qin et al.,2021) Impacts of SVs on genes have shaped environmental adaptation and domestication (Qin et al.,2021)

The Application of Molecular Markers in Rice Breeding

The paradigm shift from phenotype-dependent conventional breeding to genotype-driven precision agriculture has reshaped crop improvement frameworks. While traditional methods relying on visual trait selection and low-density molecular assays face efficiency constraints, next-generation sequencing (NGS) platforms have transformed functional genomics through high-resolution mapping strategies including MutMap, bulk segregant analysis (BSA), and genome-wide association mapping (GWAS). This methodological revolution has facilitated breakthrough discoveries of agronomically critical loci such as the blast resistance determinant Pi-ta and grain size modulator GS3. Complementing these advances, SNP array-enabled mega-population analyses using MAGIC populations have refined quantitative trait dissection accuracy while enhancing genomic prediction efficacy, effectively compressing varietal development timelines.

A breakthrough in temperate japonica genotyping emerged through innovative KASP marker development. Systematic screening of 331 candidate loci identified 48 genome-wide distributed polymorphisms demonstrating exceptional discriminative capacity across Oryza subspecies. The optimized panel achieves 90% differentiation accuracy in japonica accessions and perfect discrimination (100%) for indica cultivars and hybrids. Chromosomal specificity analysis revealed chromosome 10 markers as key indica-japonica diagnostic probes, whereas loci on chromosomes 3, 6, and 11 correlate strongly with hybrid heterosis patterns. This cost-efficient platform supersedes legacy SSR systems through simplified detection workflows and reduced sequencing dependency, resolving longstanding technical limitations in plant variety protection. Validated across 518 germplasm entries, the assay demonstrates dual utility in guiding intersubspecific hybrid breeding and optimizing yield enhancement pipelines, establishing a robust molecular framework for intellectual property management and targeted genetic improvement in Oryza species.

Distribution of 1,225 Polymorphic KASP Markers in Korean Temperate Japonica Rice Varieties (Cheon et al.,2020) Distribution of 1225 polymorphic KASP markers developed with Korean temperate japonica rice varieties (Cheon et al.,2020)

The Development Direction of Rice Genome Sequencing

Pan-Omic Integration Framework: Orchestrate multi-dimensional molecular strata spanning genomic architecture, transcriptional dynamics, protein interactomes, and metabolic flux to construct predictive models of Oryza functional networks. This systems biology paradigm deciphers genotype-phenotype relationships at unprecedented resolution, illuminating mechanistic foundations for trait engineering and pathway optimization.

Precision Breeding Based on Multi-omics: Mine gene combinations and molecular markers linked to key agronomic traits using multi-omics data. Combine this with gene editing and molecular - breeding techniques to precisely improve rice varieties. This approach enhances breeding efficiency and accuracy, fostering the development of new rice varieties that are higher-yielding, better-quality, and more stress - resistant.

Sequencing Technology Innovation and Optimization: Continuously develop and refine genome-sequencing technologies. Boost sequencing output, accuracy, and read length while reducing costs to efficiently obtain high-quality rice genome data. Explore new sequencing platforms like single-molecule and nanopore sequencing to meet diverse research needs.

Database Development and Sharing: Create a more comprehensive, organized, and accessible rice genome database. Integrate various data resources from across the globe, including rice genome-sequencing data and gene-function annotations. Establish standardized mechanisms for data storage and sharing to provide researchers with easy-to-use data-acquisition and analysis services. This promotes collaboration and communication in rice genomics research.

Conclusion

The evolution of rice genome sequencing technologies has revolutionized traditional breeding paradigms, providing genetic-level solutions to challenges such as pests, diseases, climate change, and resource constraints. Early first-generation sequencing technologies completed draft genomes of japonica and indica rice, but were limited by high costs, low throughput, and insufficient resolution for complex genomic structures. Second-generation sequencing advanced molecular marker development and genetic map construction through high-throughput and cost-effective advantages, significantly accelerating the discovery of key trait-related genes. Third-generation sequencing (e.g., PacBio SMRT, Oxford Nanopore), leveraging long-read capabilities, enabled telomere-to-telomere (T2T) genome assembly and pan-genome construction. These advancements mark the transition of rice breeding from an experience-driven approach to a new era of genomic design.

References

Jianwei ZhangCA1,Jian Che,Yidan OuyangCA2. "Engineering rice genomes towards green super rice" Current Opinion in Plant Biology. 2024,82: 102664 https://doi.org/10.1016/j.pbi.2024.102664
Jun Yu et al. "A Draft Sequence of the Rice Genome (Oryza sativa L. ssp. indica)" Science. 296,79-92(2002) https://doi.org/10.1126/science.1068037
Zhang F , Xue H , Dong X ,et al. "Long-read sequencing of 111 rice genomes reveals significantly larger pan-genomes" Genome research. 2022, 32(5):853-863 https://doi.org/10.1101/gr.276015.121
Qin P , Lu H , Du H ,et al. "Pan-genome analysis of 33 genetically diverse rice accessions reveals hidden genomic variations" Cell. 2021, 184(13) https://doi.org/10.1016/j.cell.2021.04.046
Cheon K S , Jeong Y M , Oh H ,et al. "Development of 454 New Kompetitive Allele-Specific PCR (KASP) Markers for Temperate japonica Rice Varieties" Plants. 2020, 9(11):1531 https://doi.org/10.3390/plants9111531

For research purposes only, not intended for clinical diagnosis, treatment, or individual health assessments.

Send a Message

For any general inquiries, please fill out the form below.