Mohamed, Noor Zalifah (2011) Summarizing Text Articles with Dirichlet Distribution. [Final Year Project] (Unpublished)
2011 - Summarizing text articles with dirichlet distribution.pdf
Download (1MB)
Abstract
The Latent Dirichlet Allocation (LDA) is based on the hypothesis that a person
writing a document has topics in mind. To write about a topic then means to pick a
word with a certain probability from the pool of words of that topic. A document can
then be represented as a mixture of various topics. LDA is a generative probabilistic
model for a corpus of discrete data, such as the words in a set of documents. LDA
models the words in the documents under "bag-of-words" assumption, which
basically ignores the orders of the words in the documents. Following this
"exchangeability", the distribution of the words would be independent and
identically distributed given conditioned on some parameters. This conditionally
independence allows us to build a hierarchical Bayesian model for a corpus of
documents and words. The objective is to develop a text sununarization system base
on the Latent Dirichlet Allocation (LDA) method. The system would be used to
determine the accuracy level of the method. This is done by comparing the result
produced by the text summarization system with an existing sununary that is
produced by a human.
Item Type: | Final Year Project |
---|---|
Subjects: | T Technology > T Technology (General) |
Departments / MOR / COE: | Sciences and Information Technology |
Depositing User: | Users 2053 not found. |
Date Deposited: | 09 Oct 2013 11:07 |
Last Modified: | 25 Jan 2017 09:41 |
URI: | http://utpedia.utp.edu.my/id/eprint/8730 |