Scholarworks Repository

A fast HTML web page change detection approach based on hashing and reducing the number of similarity computations

Show simple item record

dc.contributor.author Artail H.
dc.contributor.author Fawaz K.
dc.contributor.editor
dc.date 2008
dc.date.accessioned 2017-09-07T07:07:20Z
dc.date.available 2017-09-07T07:07:20Z
dc.date.issued 2008
dc.identifier 10.1016/j.datak.2008.04.003
dc.identifier.isbn
dc.identifier.issn
dc.identifier.uri http://hdl.handle.net/10938/11343
dc.description.abstract This paper describes a fast HTML web page detection approach that saves computation time by limiting the similarity computations between two versions of a web page to nodes having the same HTML tag type, and by hashing the web page in order to provide direct access to node information. This efficient approach is suitable as a client application and for implementing server applications that could serve the needs of users in monitoring modifications to HTML web pages made over time, and that allow for reporting and visualizing changes and trends in order to gain insight about the significance and types of such changes. The detection of changes across two versions of a page is accomplished by performing similarity computations after transforming the web page into an XML-like structure in which a node corresponds to an open-close HTML tag. Performance and detection reliability results were obtained, and showed speed improvements when compared to the results of a previous approach. © 2008 Elsevier B.V. All rights reserved.
dc.format.extent
dc.format.extent Pages: (326-337)
dc.language English
dc.publisher AMSTERDAM
dc.relation.ispartof Publication Name: Data and Knowledge Engineering; Publication Year: 2008; Volume: 66; no. 2; Pages: (326-337);
dc.relation.ispartofseries
dc.relation.uri
dc.source Scopus
dc.subject.other
dc.title A fast HTML web page change detection approach based on hashing and reducing the number of similarity computations
dc.type Article
dc.contributor.affiliation Artail, H., Department of Electrical and Computer Engineering, American University of Beirut, P.O. Box 11-0236, Riad El-Solh, Beirut 1107 2020, Lebanon
dc.contributor.affiliation Fawaz, K., Department of Electrical and Computer Engineering, American University of Beirut, P.O. Box 11-0236, Riad El-Solh, Beirut 1107 2020, Lebanon
dc.contributor.authorAddress Artail, H.; Department of Electrical and Computer Engineering, American University of Beirut, P.O. Box 11-0236, Riad El-Solh, Beirut 1107 2020, Lebanon; email: hartail@aub.edu.lb
dc.contributor.authorCorporate University: American University of Beirut; Faculty: Faculty of Engineering and Architecture; Department: Electrical and Computer Engineering;
dc.contributor.authorDepartment Electrical and Computer Engineering
dc.contributor.authorDivision
dc.contributor.authorEmail hartail@aub.edu.lb; kmf04@aub.edu.lb
dc.contributor.authorFaculty Faculty of Engineering and Architecture
dc.contributor.authorInitials Artail, H
dc.contributor.authorInitials Fawaz, K
dc.contributor.authorOrcidID
dc.contributor.authorReprintAddress Artail, H (reprint author), Amer Univ Beirut, Dept Elect and Comp Engn, POB 11-0236, Beirut 11072020, Lebanon.
dc.contributor.authorResearcherID
dc.contributor.authorUniversity American University of Beirut
dc.description.cited Allan J., 1998, Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, DOI 10.1145-290941.290954; [Anonymous], OPEN DIRECTORY PROJE; [Anonymous], RSS 2 0 SPECIFICATIO; BREWINGTON BE, 2000, P WWW2000 MARCH; CHAKRAVARTHY S, 2002, 2 INT WORKSH WEB DYN; Chawathe S., 1996, ACM SIGMOD INT C MAN, P493; Chawathe S., 1997, ACM SIGMOD RECORD, P26, DOI 10.1145-253262.253266; Cho J, 2000, SIGMOD RECORD, V29, P117; Cobena G, 2002, PROC INT CONF DATA, P41, DOI 10.1109-ICDE.2002.994696; Fetterly D, 2004, SOFTWARE PRACT EXPER, V34, P213, DOI 10.1002-spe.577; Flesca S, 2003, DATA KNOWL ENG, V46, P203, DOI 10.1016-S0169-023X(02)00210-0; Jacob J, 2005, DATA KNOWL ENG, V52, P209, DOI 10.1016-j.datak.2004.05.006; KAIZHONG Z, 1995, 6 ANN S COMB PATT MA, P395; Khoury I, 2007, IEEE T KNOWL DATA EN, V19, P599, DOI 10.1109-TKDE.2007.1014; Kuhn H., 2005, NAV RES LOG, V2, P7; Levering R., 2006, P 2006 ACM S DOC ENG, P198, DOI 10.1145-1166160.1166213; Lim SJ, 2001, PROC INT CONF DATA, P303; Ling Liu, 2002, World Wide Web, V5; LIU L, 2000, 9 INT C INF KNOWL MA, P512; Matloff N., 2005, ACM Transactions on Modeling and Computer Simulation, V15, DOI 10.1145-1103323.1103326; *OP TECHN, COP TRACK PROD; WANG Y, 2003, ICDE, P519; Woodruff A, 1996, COMPUT NETWORKS ISDN, V28, P963, DOI 10.1016-0169-7552(96)00064-5; NETMIND; HTMLDIFF; WEBCQ PRODUCT; HTML TIDY; RSS FEED STAT; WEBSITE WATCHER PROD
dc.description.citedCount 8
dc.description.citedTotWOSCount 4
dc.description.citedWOSCount 4
dc.format.extentCount 12
dc.identifier.articleNo
dc.identifier.coden DKENE
dc.identifier.pubmedID
dc.identifier.scopusID 45249103304
dc.identifier.url
dc.publisher.address PO BOX 211, 1000 AE AMSTERDAM, NETHERLANDS
dc.relation.ispartofConference
dc.relation.ispartofConferenceCode
dc.relation.ispartofConferenceDate
dc.relation.ispartofConferenceHosting
dc.relation.ispartofConferenceLoc
dc.relation.ispartofConferenceSponsor
dc.relation.ispartofConferenceTitle
dc.relation.ispartofFundingAgency
dc.relation.ispartOfISOAbbr Data Knowl. Eng.
dc.relation.ispartOfIssue 2
dc.relation.ispartOfPart
dc.relation.ispartofPubTitle Data and Knowledge Engineering
dc.relation.ispartofPubTitleAbbr Data Knowl Eng
dc.relation.ispartOfSpecialIssue
dc.relation.ispartOfSuppl
dc.relation.ispartOfVolume 66
dc.source.ID WOS:000258448400007
dc.type.publication Journal
dc.subject.otherAuthKeyword Change monitoring
dc.subject.otherAuthKeyword HTML
dc.subject.otherAuthKeyword Similarity computation
dc.subject.otherAuthKeyword Tree similarity
dc.subject.otherAuthKeyword Web page change detection
dc.subject.otherChemCAS
dc.subject.otherIndex Calculations
dc.subject.otherIndex HTML
dc.subject.otherIndex Information management
dc.subject.otherIndex Markup languages
dc.subject.otherIndex AND detection
dc.subject.otherIndex change detection
dc.subject.otherIndex client applications
dc.subject.otherIndex Computation time
dc.subject.otherIndex detection approach
dc.subject.otherIndex Direct access
dc.subject.otherIndex Elsevier (CO)
dc.subject.otherIndex gain insight
dc.subject.otherIndex In order
dc.subject.otherIndex Server applications
dc.subject.otherIndex web pages
dc.subject.otherIndex World Wide Web
dc.subject.otherKeywordPlus EFFICIENT
dc.subject.otherKeywordPlus ALGORITHM
dc.subject.otherWOS Computer Science, Artificial Intelligence
dc.subject.otherWOS Computer Science, Information Systems


Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search Scholarworks


Browse

My Account