Australian researchers leverage AI to discover cancer risks

By Indira Laisram
The team of researchers (from left to right): Amanda Lumsden, Elina Hypponen, Iqbal Madakkatel and Anwar Mulugeta // Pic supplied

In a groundbreaking study, conducted by researchers from the University of South Australia (UniSA) and the University of Adelaide, led by Dr Iqbal Madakkatel, Dr Amanda Lumsden, Dr Anwar Mulugeta, alongside Professor Elina Hyppönen, and with the collaboration of University of Adelaide’s Professor Ian Olver, the power of machine learning was harnessed to analyse data from a staggering 459,169 participants in the UK Biobank. This innovative research uncovered 84 key features that could potentially serve as indicators of heightened cancer risk.

In an email interview with The Indian Sun, Dr Iqbal Madakkatel, the first author of this study, and Professor Elina Hypponen provide further insights.

» Could you please provide an overview of your study and its main findings on predicting cancer risk using metabolic biomarkers?

Iqbal Madakkatel: In this study, we employed an approach that combines machine learning, a prominent subset of artificial intelligence (AI), with traditional statistical methods for quantitative data analysis in epidemiology. Our goal was to screen for risk factors associated with cancer incidence in a general population, without specific focus on site-specific cancers.

We conducted an analysis using data from over 450,000 UK Biobank participants who were cancer-free at the time of enrolment and were followed for a median of 10 years. Approximately 11% of these participants developed cancer during the follow-up period.

Through our pipeline, which integrates machine learning and statistical approaches, we examined more than 2,800 features measured for each individual at the time of their enrolment in the UK Biobank.

Our analysis revealed both well-established and novel factors associated with future cancer incidence. These factors span various categories, including personal characteristics, biomarkers, physical measures, and health and medical history.

» What inspired your team to undertake this particular research on cancer risk prediction?

Madakkatel: As you are likely aware, cancer affects a significant portion of the global population and is a leading cause of morbidity and mortality worldwide. It is estimated that approximately two out of every five people will receive a cancer diagnosis at some point in their lifetime.

There is evidence linking certain factors, such as smoking, obesity, alcohol consumption, and an unhealthy diet, to several types of cancer. For instance, smoking is a primary contributor to lung cancer, as well as cancers of the mouth, throat, bladder, liver, stomach, and kidneys, among others. Addressing these factors can result in an overall reduction in cancer incidence. It has been estimated that 30% to 50% of cancer cases could be prevented by adopting a healthier lifestyle and addressing other modifiable risk factors. Furthermore, some studies suggest that these interventions may be effective regardless of an individual’s genetic predisposition to such cancers. This motivated us to seek out novel risk factors and biomarkers that could enable the early identification of cancer, even before the disease has a chance to develop.

» Can you explain how machine learning was employed in your study to analyse data from the UK Biobank, and why this approach was chosen?

Madakkatel: Machine learning algorithms have proven highly effective in constructing predictive models for several reasons, including their ability to capture nonlinear relationships between features and disease outcomes. For instance, in certain diseases, both very low and very high BMI values can have a significant impact, sometimes resulting in a U-shaped association.

Additionally, practical scenarios often involve complex interactions among features that influence disease outcomes. For example, age was as the most important feature in our model, and the strength of the association between age and cancer incidence varied between males and females, with males showing a higher likelihood of developing cancer as they age.

Machine learning models excel at handling nonlinear relationships, intricate feature interactions, missing values, and situations with a large number of input features. Therefore, utilising machine learning was a natural choice for us in identifying risk factors in this project.

Gradient Boosting Decision Trees (GBDT) is a type of machine learning algorithm that has proven highly effective for predicting outcomes when dealing with tabular data, such as datasets containing information like height, weight, eye colour, blood pressure, and more. We employed GBDT to construct our predictive models. Since these models are inherently black box in nature, we utilized a method known as SHAP values, which is grounded in robust theoretical foundations and based on Shapley values from game theory, to interpret the model.

This method helped us identify which features were most important in predicting cancer incidence. Our machine learning pipeline is fast and efficient, making it well-suited for scenarios involving hundreds of thousands of participants and thousands of input features. Moreover, it required less data pre-processing compared to standard epidemiological approaches.

» You mentioned that more than 40% of the features identified by the model were biomarkers. Could you elaborate on the significance of these biomarkers in relation to cancer risk and their potential implications for early detection?

Elina Hypponen: Many of the biomarkers which were associated with cancer risk in our study were not traditional cancer risk factors, which suggests that there are other changes to the metabolism which can reflect the ongoing disease process. This is an interesting area of future study, and it does suggest, that by learning more about predictive biomarkers and changes to the metabolism before cancer onset, we may be able to develop tests that will help with early cancer diagnosis, hopefully at a stage when it is still possible to stop the disease.

» The study highlighted the connection between certain biomarkers and not only cancer risk but also chronic kidney and liver diseases. Could you discuss the importance of exploring these connections and their underlying mechanisms?

Hypponen:  We did not investigate associations with chronic kidney or liver diseases. What we did find was that the same serum biomarkers that would be elevated in the context of kidney or liver diseases, can also reflect increased risk of cancer. These associations can have many explanations, including a link with excess alcohol consumption that would increase the risk of liver diseases as well as the risks of many types of cancers.

» One of the findings mentions urinary microalbumin levels as a high predictor of cancer risk. Can you explain the significance of urinary microalbumin and its role as a potential biomarker?

Madakkatel: Higher microalbumin and cystatin C and lower total protein can be reflective of poor glomerular function, and while the connection of kidney health to cancer risk is not clear, having a lower glomerular filtration rate has previously been shown to associate with greater cancer incidence.

» The study also identified red cell distribution width (RDW) as a factor associated with cancer risk. What is the significance of RDW, and how does it correlate with inflammation and renal function?

Madakkatel: Greater RDW, which reflects defects in the production and/or survival of red blood cells, is a hallmark of iron deficiency anaemia and correlates with higher inflammation and poorer renal function. However, our study did not investigate the mechanisms which associate RDW with cancer risk.

» Could you elaborate on the relevance of C-reactive protein and gamma glutamyl transferase (GGT) as biomarkers for cancer risk, as mentioned in your research findings?

Madakkatel: GGT is an indicator of liver damage and oxidative stress, and one possibility is that excess oxidants may contribute to DNA mutations and oncogenesis. However, our study did not investigate the exact mechanism which may explain this association, and there is a need for future studies.

» In your opinion, what makes the machine learning approach used in this study particularly effective in identifying risk predictors for cancer?

Madakkatel: As mentioned before, machine learning models excel at handling nonlinear relationships, intricate feature interactions, missing values, and scenarios with a large number of input features. Therefore, choosing machine learning was a natural decision for identifying potential risk factors in this project. Once we had a manageable set of potential risk factors, we were able to apply standard epidemiological analyses by creating statistical models and leveraging domain expertise.

Hypponen: One great thing with machine learning is that we don’t need to make specific hypotheses as per what might be increasing the risk of cancer, when the approach will not be limited by human knowledge or preconceptions. It was exciting to conduct this study, as we were able to let the data to guide us to the possible cancer risk factors, which provided great scopes for innovation.

» Looking ahead, what are the next steps or areas of research that your team plans to explore based on these findings?

Hypponen: In this study we focussed on risk factors that affect the overall risk of cancer. However, we know that term cancer contains many different types of diseases and also, that the risk factors for different types of cancer will differ. My team is currently working to establish risk factors for ovarian cancer, which is a particularly deadly condition that is difficult to predict and to manage.

» How do you envision the potential clinical applications of your research findings in terms of early cancer detection and prevention?

Madakkatel: While our study confirms some known factors, such as smoking, others, including many blood measures, are novel and require further research to understand precisely how they are associated with cancer. Additional studies are still necessary to fully comprehend which factors directly contribute to cancer development, particularly regarding how we can prevent cancer. This may eventually lead to clinical applications of our research findings.

Hypponen: These findings suggest that it may be possible to find ways that will help to detect cancers at an early stage, which provides hopes for pre-emptive prevention. However, more work is required, and we need to develop better, more sensitive approaches before clinical application is possible.

Support independent community journalism. Support The Indian Sun.

Follow The Indian Sun on Twitter | InstagramFacebook


Donate To The Indian Sun

Dear Reader,

The Indian Sun is an independent organisation committed to community journalism. We have, through the years, been able to reach a wide audience especially with the growth of social media, where we also have a strong presence. With platforms such as YouTube videos, we have been able to engage in different forms of storytelling. However, the past few years, like many media organisations around the world, it has not been an easy path. We have a greater challenge. We believe community journalism is very important for a multicultural country like Australia. We’re not able to do everything, but we aim for some of the most interesting stories and journalism of quality. We call upon readers like you to support us and make any contribution. Do make a DONATION NOW so we can continue with the volume and quality journalism that we are able to practice.

Thank you for your support.

Best wishes,
Team The Indian Sun