Welcome To UTPedia

We would like to introduce you, the new knowledge repository product called UTPedia. The UTP Electronic and Digital Intellectual Asset. It stores digitized version of thesis, dissertation, final year project reports and past year examination questions.

Browse content of UTPedia using Year, Subject, Department and Author and Search for required document using Searching facilities included in UTPedia. UTPedia with full text are accessible for all registered users, whereas only the physical information and metadata can be retrieved by public users. UTPedia collaborating and connecting peoples with university’s intellectual works from anywhere.

Disclaimer - Universiti Teknologi PETRONAS shall not be liable for any loss or damage caused by the usage of any information obtained from this web site.Best viewed using Mozilla Firefox 3 or IE 7 with resolution 1024 x 768.

Summarizing Text Articles with Dirichlet Distribution

Mohamed, Noor Zalifah (2011) Summarizing Text Articles with Dirichlet Distribution. Universiti Teknologi Petronas. (Unpublished)

[img] PDF
Download (1086Kb)


The Latent Dirichlet Allocation (LDA) is based on the hypothesis that a person writing a document has topics in mind. To write about a topic then means to pick a word with a certain probability from the pool of words of that topic. A document can then be represented as a mixture of various topics. LDA is a generative probabilistic model for a corpus of discrete data, such as the words in a set of documents. LDA models the words in the documents under "bag-of-words" assumption, which basically ignores the orders of the words in the documents. Following this "exchangeability", the distribution of the words would be independent and identically distributed given conditioned on some parameters. This conditionally independence allows us to build a hierarchical Bayesian model for a corpus of documents and words. The objective is to develop a text sununarization system base on the Latent Dirichlet Allocation (LDA) method. The system would be used to determine the accuracy level of the method. This is done by comparing the result produced by the text summarization system with an existing sununary that is produced by a human.

Item Type: Final Year Project
Academic Subject : Academic Department - Information Communication Technology
Subject: T Technology > T Technology (General)
Divisions: Sciences and Information Technology
Depositing User: Users 2053 not found.
Date Deposited: 09 Oct 2013 11:07
Last Modified: 25 Jan 2017 09:41
URI: http://utpedia.utp.edu.my/id/eprint/8730

Actions (login required)

View Item View Item

Document Downloads

More statistics for this item...