Impute missing values with the average of other attributes in the instance.
Impute missing values with the average of other attributes in the instance. Assume the attributes of the dataset are of same kind, e.g. microarray gene expression data, the missing values can be estimated as the average of non-missing attributes in the same instance. Note that this is not the average of same attribute across different instances.
the data set with missing values.
Missing value imputation by K-Means clustering.
Missing value imputation by K-Means clustering. First cluster data by K-Means with missing values and then impute missing values with the average value of each attribute in the clusters.
the data set.
the number of clusters.
the number of runs of K-Means algorithm.
Missing value imputation by k-nearest neighbors.
Missing value imputation by k-nearest neighbors. The KNN-based method selects instances similar to the instance of interest to impute missing values. If we consider instance A that has one missing value on attribute i, this method would find K other instances, which have a value present on attribute i, with values most similar (in term of some distance, e.g. Euclidean distance) to A on other attributes without missing values. The average of values on attribute i from the K nearest neighbors is then used as an estimate for the missing value in instance A.
the data set with missing values.
the number of neighbors.
Local least squares missing value imputation.
Local least squares missing value imputation. The local least squares imputation method represents a target instance that has missing values as a linear combination of similar instances, which are selected by k-nearest neighbors method.
the data set.
the number of similar rows used for imputation.
Missing value imputation with singular value decomposition.
Missing value imputation with singular value decomposition. Given SVD A = U Σ V^{T}, we use the most significant eigenvectors of V^{T} to linearly estimate missing values. Although it has been shown that several significant eigenvectors are sufficient to describe the data with small errors, the exact fraction of eigenvectors best for estimation needs to be determined empirically. Once k most significant eigenvectors from V^{T} are selected, we estimate a missing value j in row i by first regressing this row against the k eigenvectors and then use the coefficients of the regression to reconstruct j from a linear combination of the k eigenvectors. The j th value of row i and the j th values of the k eigenvectors are not used in determining these regression coefficients. It should be noted that SVD can only be performed on complete matrices; therefore we originally fill all missing values by other methods in matrix A, obtaining A'. We then utilize an expectation maximization method to arrive at the final estimate, as follows. Each missing value in A is estimated using the above algorithm, and then the procedure is repeated on the newly obtained matrix, until the total change in the matrix falls below the empirically determined threshold (say 0.01).
the data set.
the number of eigenvectors used for imputation.
the maximum number of iterations.
High level missing value imputation operators. The NaN values in the input data are treated as missing values and will be replaced with imputed values after the processing.