Tags:CPU consumption, Genome Sequence Compression, Industrial-Oriented, Next Generation Sequencing and Scientific-Application
Abstract:
Next-generation sequencing (NGS) technology exceeds Moore’s law regarding computational capacity. Genome sequence compression is the sole option to align with the efficiency of NGS technology. Specialized compression technologies are necessary for the storage, transport, and medical analysis of genomic sequences. Compression tools are designed for either frequent access or long-term storage. To our knowledge, no research papers have been published on the comparative study addressing the specific needs of researchers. This work presents a comparative analysis of eight state-of-the-art genome compression technologies that are practically implementable among the available options. Of the eight methods, four are utilized for frequent access due to their reduced decompression time. The remaining four tools exhibit superior compression ratios, making them suitable for long-term preservation. Additionally, we have quantified an additional metric, namely % CPU consumption. Compressors with expedited processing and reduced memory consumption are selected as industrial-oriented compressors. Scientific applications utilize compressors with elevated compression ratios, despite their demand for greater computational resources. There exists a space-time trade-off associated with any available compressors. In the future, we will endeavor to design a compression tool that is ultra-fast, requires low processing resources, and achieves a competitive compression ratio.
A Study of Genome Compression Algorithms for Industrial Versus Scientific Applications Focusing Sequences in Raw and FASTA/Q Formats