ADD. (c2002)

Varjabedian, Ralph A.

dc.contributor.author	Varjabedian, Ralph A.
dc.date.accessioned	2011-05-05T12:39:59Z
dc.date.available	2011-05-05T12:39:59Z
dc.date.copyright	2002	en_US
dc.date.issued	2011-05-05
dc.date.submitted	2002-07
dc.identifier.uri	http://hdl.handle.net/10725/433
dc.description	Bibliography: leaves 91-92.	en_US
dc.description.abstract	Data mining is a relatively new term; it was introduced in the 1990s. Data mining is the process of extracting useful information from huge amounts of data; it is sometimes called data discovery or knowledge Discovery in databases [6]. What exactly defines useful information depends on the goal that data mining was for in the first place. Useful information can be used to increase revenue and to cut costs. It also can be used for the purpose of research. Advances in hardware and software in the late 1990s made data centralizing possible. Data centralizing is called "data warehousing" or "data warehouse for the centralized data". With the process of data centralization came a very important issue, the quality of the data that has been centralized, since centralization includes the joining of multiple data sources. The data given as an input for the data mining process should be of high quality in order that the results of the data mining process be accurate and reliable. Before data could be mined to extract useful information, it goes through a process called data cleansing. This process, data cleansing, is as old as the word data itself; however, the term is a new term introduced in the 1990s. Data cleansing involves several steps and several processes that include one or more algorithms. One of these steps which is of high importance is duplicate data detection, it became more important when hardware advances permitted data warehouses to be able to include more and more data. In this work, a tool which is based on the K-way sorting algorithm is implemented and it is used for duplicate data detection. The tool has many features for data cleansing. The tool also has support for multiple languages especially for the Arabic language, where no other tool offers.	en_US
dc.language.iso	en	en_US
dc.subject	Data mining	en_US
dc.subject	Data warehousing	en_US
dc.subject	Decision support systems	en_US
dc.subject	Arabic language -- Data processing	en_US
dc.title	ADD. (c2002)	en_US
dc.type	Thesis	en_US
dc.title.subtitle	Arabic Duplicate Detector : a duplicate detection data cleansing algorithm for very large Arabic data warehouses	en_US
dc.term.submitted	Summer I	en_US
dc.author.degree	MS in Computer Science	en_US
dc.author.school	Arts and Sciences	en_US
dc.author.idnumber	199609260	en_US
dc.author.commembers	Dr. Nashaat Mansour
dc.author.commembers	Dr. May Hamdan
dc.author.woa	RA	en_US
dc.description.physdesc	1 bound copy: 92 leaves; ill.; 30 cm. available at RNL.	en_US
dc.author.division	Computer Science	en_US
dc.author.advisor	Dr. Ramzi Haraty
dc.identifier.doi	https://doi.org/10.26756/th.2002.7	en_US
dc.publisher.institution	Lebanese American University	en_US