.

ADD. (c2002)

LAUR Repository

Show simple item record

dc.contributor.author Varjabedian, Ralph A.
dc.date.accessioned 2011-05-05T12:39:59Z
dc.date.available 2011-05-05T12:39:59Z
dc.date.copyright 2002 en_US
dc.date.issued 2011-05-05
dc.date.submitted 2002-07
dc.identifier.uri http://hdl.handle.net/10725/433
dc.description Bibliography: leaves 91-92. en_US
dc.description.abstract Data mining is a relatively new term; it was introduced in the 1990s. Data mining is the process of extracting useful information from huge amounts of data; it is sometimes called data discovery or knowledge Discovery in databases [6]. What exactly defines useful information depends on the goal that data mining was for in the first place. Useful information can be used to increase revenue and to cut costs. It also can be used for the purpose of research. Advances in hardware and software in the late 1990s made data centralizing possible. Data centralizing is called "data warehousing" or "data warehouse for the centralized data". With the process of data centralization came a very important issue, the quality of the data that has been centralized, since centralization includes the joining of multiple data sources. The data given as an input for the data mining process should be of high quality in order that the results of the data mining process be accurate and reliable. Before data could be mined to extract useful information, it goes through a process called data cleansing. This process, data cleansing, is as old as the word data itself; however, the term is a new term introduced in the 1990s. Data cleansing involves several steps and several processes that include one or more algorithms. One of these steps which is of high importance is duplicate data detection, it became more important when hardware advances permitted data warehouses to be able to include more and more data. In this work, a tool which is based on the K-way sorting algorithm is implemented and it is used for duplicate data detection. The tool has many features for data cleansing. The tool also has support for multiple languages especially for the Arabic language, where no other tool offers. en_US
dc.language.iso en en_US
dc.subject Data mining en_US
dc.subject Data warehousing en_US
dc.subject Decision support systems en_US
dc.subject Arabic language -- Data processing en_US
dc.title ADD. (c2002) en_US
dc.type Thesis en_US
dc.title.subtitle Arabic Duplicate Detector : a duplicate detection data cleansing algorithm for very large Arabic data warehouses en_US
dc.term.submitted Summer I en_US
dc.author.degree MS in Computer Science en_US
dc.author.school Arts and Sciences en_US
dc.author.idnumber 199609260 en_US
dc.author.commembers Dr. Nashaat Mansour
dc.author.commembers Dr. May Hamdan
dc.author.woa RA en_US
dc.description.physdesc 1 bound copy: 92 leaves; ill.; 30 cm. available at RNL. en_US
dc.author.division Computer Science en_US
dc.author.advisor Dr. Ramzi Haraty
dc.identifier.doi https://doi.org/10.26756/th.2002.7 en_US
dc.publisher.institution Lebanese American University en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search LAUR


Advanced Search

Browse

My Account