Indonesian Enhanced Bracewell‘S Text Classification Method For Indonesian News Documents

KUSUMAAGAMA FUDDOLY, AINI RACHMANIA (2014) Indonesian Enhanced Bracewell‘S Text Classification Method For Indonesian News Documents. Masters thesis, Universiti Teknologi Petronas.

[thumbnail of Thesis Final - AINI RACHMANIA.pdf] PDF
Thesis Final - AINI RACHMANIA.pdf
Restricted to Registered users only

Download (3MB)

Abstract

Text classification has been a popular research field in the area of computer science. It deals with the assignment of labels into a group of similar textual document. However, there have been very limited approaches which are focused on improving the unique character of news corpus, even less for Indonesian news document. Apart from that, only few were aimed at categorizing and identifying topics. The aim of this study is to solve the problems in text classification for online news: the large volume of data, sparsely distributed articles, classification of unseen data, and limitation of text classification approach for Indonesian news documents. Classification is done using likelihood calculation for the category classification, whereas for the topic identification cosine similarity calculation is employed. Two sets of data have been used during experiments: training and testing corpus. The training corpus consists of 900 documents, and is employed as the learning material for the classifier. The testing set covers 455 documents and are utilised to measure the accuracy of the classifier. Classification was conducted offline and online using Indonesian online news dataset from the year 2011 – 2012. The enhanced method is proven able to produce a good result with accuracy rate of up to 93.84% accuracy for category classification, and 95.64% for topic identification. In terms of computational time, the results prove that proposed classifier works optimally on n = 20, with an average of 2.81 seconds computational time. In comparison against human evaluation, the integrated method has managed to outperform by 13%. A study in depth has also been conducted to investigate the human annotators‘ responses towards the experiments process. This highlights that the enhanced method has advantage over manual classification, and is suitable for Indonesian news classification.

Item Type: Thesis (Masters)
Subjects: T Technology > T Technology (General)
Departments / MOR / COE: Sciences and Information Technology > Computer and Information Sciences
Depositing User: Mr Ahmad Suhairi Mohamed Lazim
Date Deposited: 10 Jun 2019 13:34
Last Modified: 10 Jun 2019 13:34
URI: http://utpedia.utp.edu.my/id/eprint/15129

Actions (login required)

View Item
View Item