Text Summarization System with Bayesian Theorem on Oil & Gas Drilling Topic

Kurniawan, Iwan (2007) Text Summarization System with Bayesian Theorem on Oil & Gas Drilling Topic. [Final Year Project] (Unpublished)

[thumbnail of 2007 Bachelor - Text Summarization System With Bayesian Theorem On Oil & Gas Drilling Topic.pdf] PDF
2007 Bachelor - Text Summarization System With Bayesian Theorem On Oil & Gas Drilling Topic.pdf

Download (1MB)

Abstract

Text summarization is the process of identifying the important sentences or words from
the article which later to be represented and combined to generate the summary. There
exist numerous algorithms to address the need for text summarization including
Support Vector Machine, k-nearest neighbor classifier, and decision trees.
In this project, Bayes theorem algorithm is studied and experimented by the
implementation of a textual summarizer. This algorithm is used to extract the important
points from a lengthy document, by which it classifies each word in the document
under its relevant probability of the word's likeliness to be included in the summary
given the corpus containing the summary done by the experts as the initial probability.
As the application is used and processed, it would learn and keep track of the
probability of each keyword so that it would predict the chance of certain keywords to
be included in the future summarization.
The objectives of this project are to look at the current situation in the area of text
summarization research, to study the statistical approach in automatic text summary
generation, and then to create a simple sample of text summarization tool which takes
into account the existing research.
Since the area of the application is specific, which is on oil and gas drilling topic, the
ready-used corpus on that area is not easy to find. The articles collected are from the
journals, news and any other information sources which are related to the discussed
topic. Evaluation of the application is carried out against another accompanying
system-generated summarizer which is already in the market. Human-made summary
are used as the ideal or reference summary in evaluating both performance; the Text
Summarization system and the Word Auto Summarizer. Current results show that the
Text Summarization system performs better than the Word Auto Summarizer at the
compression rate 60% and 70% (2/3 of the articles' length) by 11.31% and 10.80%
respectively. Optimum value for overall performance is 85.82%.

Item Type: Final Year Project
Subjects: Z Bibliography. Library Science. Information Resources > ZA Information resources
Departments / MOR / COE: Sciences and Information Technology > Computer and Information Sciences
Depositing User: Users 2053 not found.
Date Deposited: 24 Oct 2013 09:14
Last Modified: 25 Jan 2017 09:45
URI: http://utpedia.utp.edu.my/id/eprint/9553

Actions (login required)

View Item
View Item