First off, what is data analytics? Data analytics is the discovery and communication of meaningful patterns in data. It involves using data analysis techniques to provide insights into data sets and using those insights to guide decision making, often in a business context.
Clustering is a data analysis technique. While there are a number of approaches to clustering, the overall goal is to group data points in such a way that data in the same group (cluster) is similar to one another and different than data in the other groups.
One important aspect to clustering is visualization. An effective way to evaluate the location and tightness of the clusters is by plotting the data points and distinguishing which cluster each data point belongs to. The coordinates of each data point corresponds to values of each of its attributes. For example, a data point (70, 45) may be plotted at 70 on the x-axis and 45 on the y-axis. While it’s rather straight-forward to plot data with one, two or even three attributes (in the form of a three-dimensional chart, as seen here), a challenge arises when it comes to data with more than three attributes (dimensions). In other words, how do you draw something four-dimensional?
I’ve recently taken the parallel coordinates approach. This approach can be thought of as “unfolding” all of the dimensions and laying them side-by-side. The chart, which contains data for only one cluster, shows an example of this approach. Every dimension is included along the x-axis of the chart, while their corresponding values are on the y-axis. The blue line links the coordinates of a single sample data point. The red line does the same for the coordinates of the center of the cluster. We can use this chart to see how close the data points are to the center of the cluster at every dimension. We would need another chart for every other cluster in the data set.
Being able to plot high-dimensional data adds value outside of just clustering. Internally, I’ve found interest in seeing how data is distributed in these charts, even without separating it into clusters. As part of our recent research and development at Youneeq, high-dimensional data visualization such as this may find a place in our analytics dashboard in the future — along with some very cool applications to help drive business decisions for our users!