K-Means Clustering (K-Means Clustering)
Definition
K-Means is a classic centroid-based clustering algorithm that partitions data into clusters, minimizing the sum of squared Euclidean distances from data points to cluster centroids. K-Means is simple and efficient, widely used in data mining, pattern recognition, and image segmentation.
Algorithm Principles
Objective Function
K-Means aims to minimize the sum of squared errors (SSE):
Where is the centroid of the -th cluster and is the set of data points in the -th cluster.
Algorithm Steps
- Randomly initialize centroids
- Repeat until convergence:
- Assignment step: Assign each data point to the nearest centroid
- Update step: Recalculate each cluster's centroid as the mean of data points in the cluster
Convergence
K-Means algorithm is guaranteed to converge after finite iterations but may get stuck in local optima. The commonly used -Means++ initialization method improves initial centroid selection.
Application in Cislunar SSA Architecture Analysis
Klonowski (2025) used K-Means and K-Medoids clustering methods to classify Pareto-optimal architectures in cislunar space SSA architecture design:
Feature Extraction
Extract feature vectors from architecture configurations:
- Number of observation satellites
- Orbital family distribution
- Coverage performance metrics
- Cost metrics
Clustering Analysis
- Apply K-Means clustering to the Pareto optimal solution set
- Identify different types of architecture configurations
- Analyze characteristics and applicable scenarios of each cluster
Core Elements
Mathematical Definition
K-Means partitions dataset into disjoint clusters , minimizing the SSE objective function.
Key Properties
K-Means time complexity is , where is number of iterations, is number of data points, is number of clusters, and is dimensionality. The algorithm assumes spherical clusters of similar size.
Comparison with K-Medoids
| Characteristic | K-Means | K-Medoids |
|---|---|---|
| Centroid type | Cluster mean (may not be data point) | Actual data point in cluster |
| Noise robustness | Low | High |
| Computational complexity | Low | Higher |
Related Concepts
References
- MacQueen J. Some methods for classification and analysis of multivariate observations[C]. Berkeley Symposium on Mathematical Statistics and Probability, 1967.
- Klonowski M, Owens-Fahrner N, Heidrich C, et al. Cislunar space domain awareness architecture design and analysis for cooperative agents[J]. The Journal of the Astronautical Sciences, 2024.
