Website Content Extraction Using Web Structure Analysis

Daraham, Nor Hayati (2005) Website Content Extraction Using Web Structure Analysis. [Final Year Project] (Unpublished)

[thumbnail of 2005 - Website Content Extraction Using Web Structure Analysis.pdf] PDF
2005 - Website Content Extraction Using Web Structure Analysis.pdf

Download (1MB)

Abstract

The Web poses itself as the largest data repository ever available in the history of
humankind. Major efforts have been made in order to provide efficient to relevant
information within huge repository of data. Although several techniques have been
developed to the problem of Web data extraction, their use is still not spread, mostly
because of the need for high human intervention and the low quality of the extraction
results. For this project a domain-oriented approach to Web data extraction and discuss
it application to extracting news from Web Sites. It will use the abstraction method to
identify important sections in a web document. The relevance information will be taken
account and will be highlighted in order to develop a focused web content output. The
fact-finding and data about the project are gathered from various sources such as
internet, and books. The methodology used is a Waterfall Model that involves several
phases which are Planning, Analysis, Design and Implementation. The result of this
project is the display and review of web content extraction and how it being currently
being developed which the goals is to give more usability and easiness toward web
users.

Item Type: Final Year Project
Subjects: Z Bibliography. Library Science. Information Resources > ZA Information resources
Departments / MOR / COE: Sciences and Information Technology > Computer and Information Sciences
Depositing User: Users 2053 not found.
Date Deposited: 04 Oct 2013 15:27
Last Modified: 25 Jan 2017 09:46
URI: http://utpedia.utp.edu.my/id/eprint/8149

Actions (login required)

View Item
View Item