How AI is Revolutionizing Cardiology: A Look at Multi-pathology Classification Models on ECG Signals
Artificial Intelligence (AI) has the potential to revolutionize the field of cardiology by helping doctors diagnose and treat heart diseases more accurately and efficiently. One of the most promising areas of AI in cardiology is the prediction of key pathologies via multi-label (multi-pathology) classification models based on electrocardiogram (ECG) signals.
ECG signals are electrical signals generated by the heart that can be recorded non-invasively using electrodes placed on the skin. ECG signals are an essential tool for diagnosing and monitoring heart diseases, such as arrhythmias and heart attacks. However, interpreting ECG signals can be challenging, as they are complex and contain a large amount of information.
Our colleagues Georgi Nalbantov Phd., Head of Data Science and AI, and Svetoslav Ivanov, Lead Data Scientist, worked on a paper on the multi-label classification modelling of ECG signals during the PhysioNet/Computing in Cardiology Challenge 2020 focused on automated, open-source approaches for classifying cardiac abnormalities.
With their vast experience in the field, they played a pivotal role in designing and implementing the AI models that were used to analyze ECG signals. More about their work on “Multi-Class Classification of Pathologies Found on Short ECG Signals” you can read in the paper here.
In this article, we will take a closer look at how AI is revolutionizing cardiology through the use of the multi-label classification of ECG signals and showcase a part of the work of our colleagues.
What is multi-label classification?
Multi-label classification is a machine learning technique that can be used to analyze ECG signals and automatically classify them into different categories. For example, a multi-label classification algorithm could analyze an ECG signal and classify it as showing signs of atrial fibrillation, ventricular tachycardia, and left ventricular hypertrophy.
To develop a multi-label classification algorithm, researchers first need to train a machine-learning model using a large dataset of ECG signals that have been manually labelled by experts. The model then uses this labelled data to learn how to classify new ECG signals.
One of the key benefits of the multi-label classification of ECG signals is that it can help doctors diagnose heart diseases more accurately and quickly. Traditionally, doctors have had to manually analyze ECG signals, which can be time-consuming and prone to human error. With a multi-label classification algorithm, doctors can receive automated diagnoses that are based on objective data and are not subject to human biases.
Another benefit of multi-label classification of ECG signals is that it can help doctors identify heart diseases at an earlier stage. By analyzing ECG signals more comprehensively, doctors can detect subtle changes in the heart's electrical activity that may indicate the early stages of heart disease. This early detection can allow doctors to intervene earlier, potentially improving patient outcomes.
Revolutionising diagnosis and treatment
There are already several AI-based ECG analysis tools available on the market, such as the FDA-approved ECG AI-Guardian system. These systems use multi-label classification algorithms to automatically analyze ECG signals and provide doctors with real-time diagnoses.
However, there are still challenges to be addressed in the development of AI-based ECG analysis tools. One of the biggest challenges is ensuring that the algorithms are accurate and reliable. To achieve this, researchers need access to high-quality datasets that contain a wide range of ECG signals from different patient populations.
Georgi Nalbantov, Svetoslav Ivanov, and Jeffrey van Prehn developed a method to classify heart conditions based on a large dataset of ECG strips. They used a strategy to determine the likelihood of different pathologies based on the scores they assigned to the predictions. This strategy ensured that the expected distribution of predicted labels in a test set was the same as the observed distribution in the training set. However, there is a limitation to this approach in that the predicted labels for a test patient must have occurred in the training dataset.
“To make sure that the model is reliable in real-world conditions, the ECG signal recordings with noisy intervals are excluded before pathology prediction to avoid high false alarm rates.”
The Computing in Cardiology Challenge included 24 pathologies, but many participants expressed doubts about the validity of the provided labels for parts of the training set. This makes the learning task extremely difficult as if a doctor fails to detect a given pathology, or if a noisy signal is provided, a model predicts that pathology should be present while a doctor may not spot it due to the noise. This makes the learning task extremely difficult.
Approaches of modelling algorithms
Georgi and his team implemented a multi-label multi-step strategy for classifying pathologies in ECG strips, using a total of 23 pathologies as well as a ‘sinus rhythm’. They found that including the prediction for a certain group of pathologies as an input feature in one-vs-rest binary classification models was beneficial. They applied AdaBoost as a classifier for all binary classification problems, which they acknowledge to be suboptimal, and did not perform feature selection for each classification problem.
The Error-Correcting Output Codes (ECOC) approach they used for assigning multiple predicted labels to a test patient has an implicit downside in that these labels should have occurred in the training dataset. The authors note that their method may be limited for real-world applications because noisy intervals in ECG signal recordings have to be excluded before pathology prediction to avoid high false alarm rates. They suggest that providing longer strips with proven pathologies, where doctors can detect a pathology on part of the ECG strip and cannot detect the pathology on other parts of the strips, is necessary to move closer to a real-world application of their models.
Lastly, they purposely avoided using deep learning/neural network techniques to ensure their results are replicable.
In conclusion
An annotation tool for labelling ECG wave points and intervals/templates has been created in MATLAB, and used for labelling pathological intervals, as well as noisy intervals and inconsistencies between the ECG data and the pre-assigned labels. Several one-vs-rest binary classifiers were built, where morphological features specific to each pathology had been generated from the signals. The binary classifiers were augmented by a multi-class classifier using an Error Correcting Output Codes (ECOC) methodology. This approach achieved a challenge validation score of 0.616, and a full test score of 0.194, placing the team 23 out of 41 in the official ranking.
Read the whole paper Multi-Class Classification of Pathologies Found on Short ECG Signal, by Georgi Nalbantov, Svetoslav Ivanov, and Jeffrey van Prehn at www.cinc.org/archives/2020/pdf/CinC2020-071.pdf.