A novel algorithm for extracting the user reviews from web pages

dc.authoridUzun, Erdinç/0000-0003-4351-2244
dc.authoridTUFEKCI, PINAR/0000-0003-4842-2635
dc.authorwosidTufekçi, Pinar/ABA-5121-2020
dc.authorwosidUzun, Erdinç/AAG-5529-2019
dc.authorwosidUçar, Erdem/G-6929-2014
dc.contributor.authorUcar, Erdem
dc.contributor.authorUzun, Erdinc
dc.contributor.authorTufekci, Pinar
dc.date.accessioned2024-06-12T11:00:08Z
dc.date.available2024-06-12T11:00:08Z
dc.date.issued2017
dc.departmentTrakya Üniversitesien_US
dc.description.abstractExtracting the user reviews in websites such as forums, blogs, newspapers, commerce, trips, etc. is crucial for text processing applications (e.g. sentiment analysis, trend detection/monitoring and recommendation systems) which are needed to deal with structured data. Traditional algorithms have three processes consisting of Document Object Model (DOM) tree creation, extraction of features obtained from this tree and machine learning. However, these algorithms increase time complexity of extraction process. This study proposes a novel algorithm that involves two complementary stages. The first stage determines which HTML tags correspond to review layout for a web domain by using the DOM tree as well as its features and decision tree learning. The second stage extracts review layout for web pages in a web domain using the found tags obtained from the first stage. This stage is more time-efficient, being approximately 21 times faster compared to the first stage. Moreover, it achieves a relatively high accuracy of 96.67% in our experiments of review block extraction.en_US
dc.identifier.doi10.1177/0165551516666446
dc.identifier.endpage712en_US
dc.identifier.issn0165-5515
dc.identifier.issn1741-6485
dc.identifier.issue5en_US
dc.identifier.scopus2-s2.0-85029475331en_US
dc.identifier.scopusqualityQ1en_US
dc.identifier.startpage696en_US
dc.identifier.urihttps://doi.org/10.1177/0165551516666446
dc.identifier.urihttps://hdl.handle.net/20.500.14551/20706
dc.identifier.volume43en_US
dc.identifier.wosWOS:000415348100008en_US
dc.identifier.wosqualityQ2en_US
dc.indekslendigikaynakWeb of Scienceen_US
dc.indekslendigikaynakScopusen_US
dc.language.isoenen_US
dc.publisherSage Publications Ltden_US
dc.relation.ispartofJournal Of Information Scienceen_US
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/closedAccessen_US
dc.subjectEfficient Extractionen_US
dc.subjectWeb Data Extractionen_US
dc.subjectWeb User Reviewsen_US
dc.titleA novel algorithm for extracting the user reviews from web pagesen_US
dc.typeArticleen_US

Dosyalar