Multi-stream word-based compression algorithm for compressed text search

dc.contributor.authorOzturk, Emir
dc.contributor.authorMesut, Altan
dc.contributor.authorDiri, Banu
dc.date.accessioned2019-06-26T07:12:47Z
dc.date.available2019-06-26T07:12:47Z
dc.date.issued2018
dc.departmentFakülteler, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümüen_US
dc.descriptionOzturk, Emir (Trakya author) Mesut, Altan (Trakya author)en_US
dc.description.abstractIn this article, we present a novel word-based lossless compression algorithm for text files using a semi-static model. We named this method the Multi-stream word-based compression algorithm (MWCA)' because it stores the compressed forms of the words in three individual streams depending on their frequencies in the text and stores two dictionaries and a bit vector as side information. In our experiments, MWCA produces a compression ratio of 3.23 bpc on average and 2.88 bpc for files greater than 50 MB; if a variable length encoder such as Huffman coding is used after MWCA, the given ratios are reduced to 2.65 and 2.44 bpc, respectively. MWCA supports exact word matching without decompression, and its multi-stream approach reduces the search time with respect to single-stream algorithms. Additionally, the MWCA multi-stream structure supplies the reduction in network load by requesting only the necessary streams from the database. With the advantage of its fast compressed search feature and multi-stream structure, we believe that MWCA is a good solution, especially for storing and searching big text data.en_US
dc.identifier.citationÖztürk, E., Mesut, A., & Diri, B. (2018). Multi-Stream Word-Based Compression Algorithm for Compressed Text Search. Arabian Journal for Science and Engineering, 43(12), 8209-8221.en_US
dc.identifier.doi10.1007/s13369-018-3378-9en_US
dc.identifier.endpage8221en_US
dc.identifier.issue12en_US
dc.identifier.scopus2-s2.0-85056276197en_US
dc.identifier.scopusqualityN/Aen_US
dc.identifier.startpage8209en_US
dc.identifier.urihttps://doi.org/10.1007/s13369-018-3378-9
dc.identifier.urihttps://hdl.handle.net/20.500.14551/4234
dc.identifier.volume43en_US
dc.identifier.wosWOS:000449936300097en_US
dc.identifier.wosqualityQ3en_US
dc.indekslendigikaynakWeb of Scienceen_US
dc.indekslendigikaynakScopusen_US
dc.language.isoenen_US
dc.publisherSpringer Heidelbergen_US
dc.relation.ispartofArabian Journal for Science and Engineeringen_US
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/closedAccessen_US
dc.snmz20240608_ID_Qen_US
dc.subjectData Compressionen_US
dc.subjectText Compressionen_US
dc.subjectDictionary-Based Compressionen_US
dc.subjectCompressed Matchingen_US
dc.subjectMWCAen_US
dc.subjectNatural-Language Texten_US
dc.subjectStringsen_US
dc.titleMulti-stream word-based compression algorithm for compressed text searchen_US
dc.typeArticleen_US

Dosyalar

Lisans paketi
Listeleniyor 1 - 1 / 1
Küçük Resim Yok
İsim:
license.txt
Boyut:
1.44 KB
Biçim:
Item-specific license agreed upon to submission
Açıklama: