Yazar "Bulus, Halil Nusret" seçeneğine göre listele
Listeleniyor 1 - 2 / 2
Sayfa Başına Sonuç
Sıralama seçenekleri
Öğe Automatically Discovering Relevant Images From Web Pages(IEEE-Inst Electrical Electronics Engineers Inc, 2020) Uzun, Erdinc; Ozhan, Erkan; Agun, Hayri Volkan; Yerlikaya, Tarik; Bulus, Halil NusretWeb pages contain irrelevant images along with relevant images. The classification of these images is an error-prone process due to the number of design variations of web pages. Using multiple web pages provides additional features that improve the performance of relevant image extraction. Traditional studies use the features extracted from a single web page. However, in this study, we enhance the performance of relevant image extraction by employing the features extracted from different web pages consisting of standard news, galleries, video pages, and link pages. The dataset obtained from these web pages contains 100 different web pages for each 200 online news websites from 58 different countries. For discovering relevant images, the most straightforward approach extracts the largest image on the web page. This approach achieves a 0.451 F-Measure score as a baseline. Then, we apply several machine learning methods using features in this dataset to find the most suitable machine learning method. The best f-Measure score is 0.822 using the AdaBoost classifier. Some of these features have been utilized in previous web data extraction studies. To the best of our knowledge, 15 new features are proposed for the first time in this study for discovering the relevant images. We compare the performance of the AdaBoost classifier on different feature sets. The proposed features improve the f-Measure by 35 percent. Besides, using only the cache feature, which is the most prominent feature, corresponds to 7 percent of this improvement.Öğe A new word-based compression model allowing compressed pattern matching(Tubitak Scientific & Technological Research Council Turkey, 2017) Bulus, Halil Nusret; Carus, Aydin; Mesut, AltanIn this study a new semistatic data compression model that has a fast coding process and that allows compressed pattern matching is introduced. The name of the proposed model is chosen as tagged word-based compression algorithm (TWBCA) since it has a word-based coding and word-based compressed matching algorithm. The model has two phases. In the first phase a dictionary is constructed by adding a phrase, paying attention to word boundaries, and in the second phase compression is done by using codewords of phrases in this dictionary. The first byte of the codeword determines whether the word is compressed or not. By paying attention to this rule, the CPM process can be conducted as word based. In addition, the proposed method makes it possible to also search for the group of consecutively compressed words. Any of the previous pattern matching algorithms can be chosen to use in compressed pattern matching as a black box. The duration of the CPM process is always less than the duration of the same process on the texts coded by Gzip tool. While matching longer patterns, compressed pattern matching takes more time on the texts coded by compress and end-tagged dense code (ETDC). However, searching shorter patterns takes less time on texts coded by our approach than the texts compressed with compress. Besides this, the compression ratio of our algorithm has a better performance against ETDC only on a file that has been written in Turkish. The compression performance of TWBCA is stable and does not vary over 6% on different text files.