Researchers João Tiago Paulo (INESC TEC/UMinho), Vinicius Vielmo Cogo and Alysson Neves Bessani (both from ULisboa) have created a technology that can allow faster and cheaper studies in genome sequencing and with 75% less storage space in data. The study was published in “IEEE”.
They combined a new technique of deduplication of data based on similarities and patterns found in human genome sequencing files and a coding of changes to retrieve that data. The innovation of this approach is to replace the complete description of the sequenced genome data with small pointers describing only the changes needed for the recovery of the original data, thus reducing the necessary space and storage cost.
Hospitals and biobanks can thus save on data storage and, in parallel, researchers can read the data more quickly. These institutions are responsible for storing and distributing millions of biological samples to researchers around the world and are under pressure to store sequenced genomic data from these samples as well, so these savings will have a significant impact on their daily lives.
Application in infrastructures that already use generic compression algorithms in this data benefit from an additional reduction in cost and storage space of about 22% and allow researchers to access the data up to five times faster. In the near future, researchers intend to make the solution available in open source, improve the results through more in-depth studies on the standards and adapt the conclusions to the sequencing of genomes from other species.
Article can be accessed at: www.di.fc.ul.pt/~bessani/publications/tc20-genodedup.pdf