Classification of ECG Signals by Unsupervised Learning

This project deals with three methods of unsupervised classification of signals. It analyses and compares the behavior of the classification algorithms when processing ECG signals (heart-beats signals).

Abstract
This project deals with three methods of unsupervised classification of signals. It analyses and compares the behavior of the classification algorithms when processing ECG signals (heart-beats signals).
As can be seen in the scheme below, each ECG signal first goes through a Pre-Processing stage that arranges it in a matrix as discrete beats, moves each beat to the DC level of the signal, and normalizes it to the signal’s intensity. This output goes into the three classification algorithms. The three classifications of the signal now go through a Post-Processing stage, in order to decrease the number of groups that the signal was classified in each method (by uniting similar groups). The output of this stage is compared with a classification made by a Doctor (each method is compared seperatley). And finally we can determine which algorithm is best for ECG signals.

1

Algorithm stages
The analyzed signals are ECG signals.
The ECG is a representation of the electrical activity of the heart. In an ECG signal each period is consisted of three major parts : P wave, QRS complex and T wave. These three parts are connected by horizontal spaces as can be seen in the following ECG diagram (of a healthy man):

2

The three classification algorithms are :
Competitive Learning – Algorithm based on the neurons model. It works similarly to the way the neurons in the brain save data and patterns. This algorithm uses iterative learning of the signals by mutual competition.

Agglomerative Clustering – The basic idea of this algorithm is creating groups very different from one another and using weighted average for them.

Ensemble Partition – Based on binary partition in sorting by averaged intensity. The partition takes place until finding subgroups that are close enough to each other (we determine the distance).

Results
After the classification of each algorithm, a criteria was made in order to compare between the different classifications results. This criteria is called “success” and is made of 2 demands :

  • The number of groups determined by the doctor would be as close as possible to the number of groups determined by the classification algorithm. This demand will be represented by parameter X1 in our calculation
  • The ability of the algorithm to create homogenized groups, hence : to create such groups, so that each group is consisted mainly of one kind of heart-beats, and most of the heart-beats of this kind are in that group. Due to this two-sided demand, the parameter X2 , which represents this demand, will be consisted of two sub-calculations

The Signal to Noise Ratio Effect
The following graph describes the behavior of each algorithm as a function of the signal to noise ratio. Each point in the graph represents a certain signal (as a matter of fact – its success value).
The formula used to calculate the signal to noise ratio is:
3

As can be seen in the graph :

  • The Ensemble Partition algorithm has a positive linear bond between the success and the S/N. Its success values are lower than the other algorithm’s success values in any S/N
  • The Agglomerative Clustering’s and Competitive Learning’s success values show no distinct orientation and are mostly uniform (although the Competitive’s values are less uniform than the Agglomerative’s)

4

The Epsilon Parameter Effect
The Epsilon parameter is the parameter used in the post-processing stage. It states the minimal distance between the classification groups from which they will be united. Hence: the bigger Epsilon we choose to take, the smaller number of groups we get in the end of our process (after the post-processing stage).
The graphs below show the average and standard deviation of success values of few analyzed signals as a function of the Epsilon parameter (Each signal was analyzed 4 times – when every time the epsilon was changed).
As can be seen, the Agglomerative Clustering isn’t affected by the epsilon values, while the Partition and the Comp. Learning are affected by epsilon, but no optimal epsilon value was found due to the big standard deviation when epsilon is 0.16 (which, according to the average graph, is the optimal epsilon).
5

6

The Effect of the Number of Groups in the Input Signal
The number of groups in the input signal is the number of groups determined by the doctor.
As can be seen in the graph below, in the Partition algorithm : The more groups there are in the input signal, the less successful the classification is. The Clustering shows no distinct orientation (It can be seen in the decline of parameter X2 and the increase of parameter X1 , which are not shown here). The Comp. Learning is not affected by the number of groups.
7

Conclusions

  • The Ensemble Partition algorithm has a long processing-time due to the large number of iterations needed to classify the small groups. Also, its success values were lower than the other algorithms’ . This algorithm is also dependent on all the parameters that were tested (Epsilon, S\N and number of groups)
  • The Competitive Learning algorithm had a processing-time highly dependent on the sound to noise ratio. Its classification success was highly dependent on the Epsilon value. In some cases, this algorithm created classification groups that were empty (there wasn’t any beat that belonged to them.)
  • The Agglomerative Clustering had the best processing-time among all three, and unlike the two others it was independent on the sound to noise ratio. The algorithm’s success values weren’t also dependent on the S\N and not on the Epsilon parameter. Generally, Its success values were high in comparison to the other algorithms’ success values

Conclusion: Agglomerative Clustering is the better algorithm for classifying ECG signals.

Tools
The algorithms were implemented using Matlab 5.1.
The ECG signals were taken from the “MIT-BIH Arrhythmia Database” from “MITDB”.
There was also built an interface (GUI) that enables a convenient show of the groups and the beats that belong to them :

8 9

Acknowledgments
We would like to thank our supervisors Elad Yom-Tov, Johanan Erez, Ina Krinsky and the PSPL laboratory for their support.