Abstract
The correlation matrix is a fundamental statistic that used in many fields. For example, GroupLens, a collaborative filtering system, uses the correlation between users for predictive purposes. Since the correlation is a natural similarity measure between users, the correlation matrix may be used as the Gram matrix in kernel methods. However, the estimated correlation matrix sometimes has a serious defect: although the correlation matrix is originally positive semidefinite, the estimated one may not be positive semidefinite when not all ratings are observed. To obtain a positive semidefinite correlation matrix, the nearest correlation matrix problem has recently been studied in the fields of numerical analysis and optimization. However, statistical properties are not explicitly used in such studies. To obtain a positive semidefinite correlation matrix, we assume an approximate model. By using the model, an estimate is obtained as the optimal point of an optimization problem formulated with information on the variances of the estimated correlation coefficients. The problem is solved by a convex quadratic semidefinite program. A penalized likelihood approach is also examined. The MovieLens data set is used to test our approach.