Welcome To UTPedia

We would like to introduce you, the new knowledge repository product called UTPedia. The UTP Electronic and Digital Intellectual Asset. It stores digitized version of thesis, final year project reports and past year examination questions.

Browse content of UTPedia using Year, Subject, Department and Author and Search for required document using Searching facilities included in UTPedia. UTPedia with full text are accessible for all registered users, whereas only the physical information and metadata can be retrieved by public users. UTPedia collaborating and connecting peoples with university’s intellectual works from anywhere.

Disclaimer - Universiti Teknologi PETRONAS shall not be liable for any loss or damage caused by the usage of any information obtained from this web site.Best viewed using Mozilla Firefox 3 or IE 7 with resolution 1024 x 768.

Website Content Extraction Using Web Structure Analysis

Daraham, Nor Hayati (2005) Website Content Extraction Using Web Structure Analysis. Universiti Teknologi Petronas. (Unpublished)

[img] PDF
Download (1522Kb)


The Web poses itself as the largest data repository ever available in the history of humankind. Major efforts have been made in order to provide efficient to relevant information within huge repository of data. Although several techniques have been developed to the problem of Web data extraction, their use is still not spread, mostly because of the need for high human intervention and the low quality of the extraction results. For this project a domain-oriented approach to Web data extraction and discuss it application to extracting news from Web Sites. It will use the abstraction method to identify important sections in a web document. The relevance information will be taken account and will be highlighted in order to develop a focused web content output. The fact-finding and data about the project are gathered from various sources such as internet, and books. The methodology used is a Waterfall Model that involves several phases which are Planning, Analysis, Design and Implementation. The result of this project is the display and review of web content extraction and how it being currently being developed which the goals is to give more usability and easiness toward web users.

Item Type: Final Year Project
Academic Subject : Academic Department - Information Communication Technology
Subject: Z Bibliography. Library Science. Information Resources > ZA Information resources
Divisions: Sciences and Information Technology > Computer and Information Sciences
Depositing User: Users 2053 not found.
Date Deposited: 04 Oct 2013 15:27
Last Modified: 25 Jan 2017 09:46
URI: http://utpedia.utp.edu.my/id/eprint/8149

Actions (login required)

View Item View Item

Document Downloads

More statistics for this item...