Summarizing Text Articles with Dirichlet Distribution

Mohamed, Noor Zalifah (2011) Summarizing Text Articles with Dirichlet Distribution. [Final Year Project] (Unpublished)

[thumbnail of 2011 - Summarizing text articles with dirichlet distribution.pdf] PDF
2011 - Summarizing text articles with dirichlet distribution.pdf

Download (1MB)

Abstract

The Latent Dirichlet Allocation (LDA) is based on the hypothesis that a person
writing a document has topics in mind. To write about a topic then means to pick a
word with a certain probability from the pool of words of that topic. A document can
then be represented as a mixture of various topics. LDA is a generative probabilistic
model for a corpus of discrete data, such as the words in a set of documents. LDA
models the words in the documents under "bag-of-words" assumption, which
basically ignores the orders of the words in the documents. Following this
"exchangeability", the distribution of the words would be independent and
identically distributed given conditioned on some parameters. This conditionally
independence allows us to build a hierarchical Bayesian model for a corpus of
documents and words. The objective is to develop a text sununarization system base
on the Latent Dirichlet Allocation (LDA) method. The system would be used to
determine the accuracy level of the method. This is done by comparing the result
produced by the text summarization system with an existing sununary that is
produced by a human.

Item Type: Final Year Project
Subjects: T Technology > T Technology (General)
Departments / MOR / COE: Sciences and Information Technology
Depositing User: Users 2053 not found.
Date Deposited: 09 Oct 2013 11:07
Last Modified: 25 Jan 2017 09:41
URI: http://utpedia.utp.edu.my/id/eprint/8730

Actions (login required)

View Item
View Item