A new approach to record clustering for large databases. (c1997)

Makhoulian, Raffi H.

A new approach to record clustering for large databases. (c1997)

Makhoulian, Raffi H.

URI: http://hdl.handle.net/10725/151

DOI: https://doi.org/10.26756/th.1997.1

Date: 2010-12-13

Abstract:

This work introduces a new approach to record clustering where a hybrid algorithm is presented that clusters records based upon threshold values and the query patterns made to a particular database. We study the space density of a file and how it affects retrieval time before and after clustering. The hamming distaoce of a file is used as a measure of space density. The objective of the algorithm is to minimize the hamming distance of the file while attaching significance to the most frequent queries being asked. Simulation experiments conducted proved that a great reduction in response time is yielded after the restructuring of a file. Criteria, such as, block size, threshold value, percentage of records satisfYing a given set of queries, etc ... , which affect clustering and response time are also studied. Random statistical and graph theory are used to substantiate the experimental results. As a further means for predicting perfonnance, regression analysis is employed and later compared to experimental figure.

Show full item record