Abstract:In high-voltage shunt reactor voiceprint signal monitoring systems, the high-dimensional non-stationarity of long-term, large-scale unlabeled voiceprint data make feature extraction difficult and reduce the adaptability of unsupervised clustering. To address this, a 750 kV reactor voiceprint clustering method based on deep adaptive K-means++ clustering algorithm (DAKCA) is proposed. First, the improved stacked sparse autoencoder (SSAE), fine-tuned using a two-stage unsupervised strategy, is used to extract the 32-dimensional depth features from the normalized frequency domain data obtained via fast Fourier transform. Then, an adaptive K-means++ clustering algorithm is developed using clustering validation index based on the nearest neighbor (CVNN), and a reactor voiceprint clustering model which can adaptively determine the optimal number of clusters is constructed. Finally, the method is validated using real measured voiceprint data from a 750 kV reactor in Northwest China. The results demonstrate that the DAKCA algorithm can stably extract 32-dimensional depth features from unlabeled voiceprint data under varying sample balance conditions and achieve optimal clustering, providing a reference for the direct and efficient use of unlabeled reactor voiceprint data.