Researchers have developed an artificial intelligence (AI)-based tool that enables more complete and accurate genome assembly from long-read sequencing data, according to a study published in Nature.
The tool, called HERRO (haplotype-aware error correction), was designed to address one of the major challenges in long-read sequencing: distinguishing true genetic variation from sequencing errors. By improving read accuracy before assembly, HERRO allows researchers to generate high-quality genome maps using a single long-read sequencing workflow.
Complete genome assembly is important because it enables analysis of genomic regions that are difficult to study using conventional approaches, including repetitive DNA sequences, centromeres, telomeres, and complex regions of the sex chromosomes. These regions can contain clinically relevant genetic variation that may be missed in incomplete or fragmented genome assemblies.
HERRO uses a deep learning model to correct sequencing errors while preserving genuine differences between inherited chromosome copies. This is particularly important for human genomes, which contain two similar but non-identical sets of chromosomes. Overcorrection can remove biologically important variation, whereas HERRO was designed to maintain these differences during the assembly process.
The researchers evaluated HERRO using multiple human and nonhuman genome datasets. Corrected sequencing reads showed substantially lower error rates than uncorrected reads, with improvements observed across mismatches, insertions, and deletions. When combined with existing assembly software, the corrected reads produced more contiguous genome assemblies and enabled reconstruction of complete human chromosomes from end to end, including the challenging X and Y chromosomes.
The approach was also tested in zebrafish, fruit flies, and thale cress, where similar improvements in assembly quality were observed.
The findings highlight a potential route toward more accessible complete genome assembly. Higher-quality genome maps can improve the detection of structural variants and other forms of genetic variation associated with inherited disorders, cancer, and other diseases. The authors note that more complete assemblies may also support precision medicine research by providing a more detailed view of genomic regions that are difficult to resolve using standard sequencing approaches.
A further advantage is the potential to simplify genome assembly workflows. Current approaches often require multiple sequencing technologies and complex laboratory processes. HERRO enabled high-quality assemblies from a single long-read sequencing platform, which could reduce workflow complexity, lower DNA input requirements, and support broader use of complete genome assembly in research and clinical genomics.
The authors noted that some highly repetitive genomic regions remain challenging and that the method requires substantial computational resources.
