91% accuracy! Is deep learning the key to determining effective cancer treatment?

A growing problem doctors face is that the primary cell type that produces cancer is sometimes undetermined. This leaves patients with generalized treatment options and a lower survival chance. However, with the use of neural networks and DNA sequencing, there seems to be hope for identifying these primary sites and providing specialized treatment. But what aspects of the genome gives us the best clues and why might some cancer types be harder to identify?     

There are about 200 different types of cancers in humans, each with distinct characteristics and growth behaviors. Identifying the organ of origin and cell type allows for accurate predictions of manifestation, spread patterns, treatment, and prognosis. Pathologists currently determine the origin sites via immunohistochemistry (antibody staining to observe cellular features) and educated guesses (49% accuracy). Cancers of Unknown Primary Sites (CUPS) are tumors where the cell type of origin is unknown. CUPS account for 5% of the cases, but they are the fourth most common cause of cancer deaths due to a lack of specialized treatment options. So, how can we assist the clinicians in determining cancer origin sites more accurately?

A recent study by Jiao et al. (2020) turned to deep learning and whole-genome sequencing for answers. They used 2,436 tumour samples across 24 major types of cancers to train the neural network to detect underlying patterns. It identified patterns of non-heritable somatic mutation that accumulated over the life of an individual. By looking at these mutations’ distribution, types, and affected genes, they identified cancers with an average accuracy of 91%. This ranged from 61% for stomach cancer to 99% for kidney cancer. The authors attributed the misclassification of cancers due to the similarities between cell lineages (see figure below). But which features did they use to achieve accuracies almost 100% better than a professional clinician?

Image by the author

A simplified and general schematic of cell fates during development. During early development, the embryo produces three layers of cells; ectoderm, mesoderm, and endoderm. As the organism develops, these cells differentiate further into specialized cells such as lymphocytes (immune cells), keratinocytes (skin cells), neurons etc.

They found sites of mutational build-up that varied across the genomes of different cancers. The distribution of mutations seemed to indicate the cell lineage of the cancer type. All cells develop from three primary cell layers: endoderm, mesoderm, and ectoderm, which further specialize into mature cells (see figure above). They found that closely related cells share similar distribution patterns of somatic mutations. Most of these are neutral passenger mutations that do not initiate cancer formation but are present in the cancers’ genome. However, this created some confusion in classifying the correct cell type as well. The neural network sometimes mistook one cancer type with another that shares related precursor cells. For example, stomach cancer was mistaken as esophageal, pancreatic, or colon cancers about 30% of the time. As such, cell lineages seemed to narrow the search but made it difficult to differentiate between related cell types.

Mutation types may help distinguish between different cancers that affect closely related cell types. Mutations disrupt the balance of gene expression within cells and can cause serious health issues. Some types of mutations are specific to certain cancers. For example, ultra-violet (UV) light causes single nucleotide changes (C > T and G > A) commonly seen in melanomas, or skin cancers. In chronic myelogenous leukaemia, large sections of chromosomes 9 and 22 are often rearranged. In other words, mutations can range from single nucleotides to large portions of genetic information. 

Driver mutations are mutations in oncogenes that activate signaling pathways that cause cells to grow uncontrollably and form cancer. Certain genes affected are also specific to certain cancer types. However, they are more dispersed across cell types. For example, a mutated BRAF gene is often present in skin, thyroid, and colorectal cancers. Mutations in BRCA1/2 are often present in breast and ovarian cancers. 

So, which features were the best to identify the correct cancer type and primary site? They found that the combination of distribution and type of mutations produced the most accurate predictions. While adding gene driver mutations reduced the accuracy by introducing more ambiguity between different cancer types.

Adapted from Jiao et al. (2020)

Matrices that show the predictive percentage of calls made by the deep learning neural network on a different set of primary tumours (left) and metastatic tumours (right). The rows are the types of cancers tested and their samples sizes. The columns are the calls made by the neural network. The red box indicates cancer types with lower than 80% predictive recall accuracy. We can see that many calls that were mistaken for other cancer types fall within similar cell types derived from the same layer of cells (ectoderm, mesoderm or ectoderm) and within clusters of differentiated cell types depicted in the previous figure. This demonstrates the applicability of the neural network to real life clinical settings. Note: the 80% cut-off was not put forward by Wei Jiao et al., it was introduced for the purposes of this figure.

The authors did not stop there! They then tested the neural network on a separate set of tumour samples to mimic a clinical setting. This test consisted of 1,436 samples from 14 primary tumours (left) and 2,120 samples across 16 known metastatic cancers (right). The neural network achieved an accuracy of 88% and 83%, respectively. There were still misclassifications due to similar precursor cells but with some variation between training and testing samples.

To determine if this neural network software is applicable in a clinical setting, it needs to be accessible to the public via the web. This proved to be the most challenging part of this project which occurred after the paper’s publication, said Dr. Lincoln Stein — the principal investigator. “There are numerous technical challenges, including refactoring the code so that the slow step of loading the classifier model occurs only once at start-up rather than every time a classification job is run, as well as ethico-legal issues to overcome involving preserving patient confidentiality.”

Overall, this study demonstrates the possible clinical application of neural networks to assist clinicians in treating CUPS patients by analyzing the distribution of passenger somatic mutations.

Learn more about mutational signatures:
  1. Explaining cancer type specific mutations with transcriptomic and epigenomic features in normal tissues

About the author

This post was written by Elijah Aversano. He is a recent graduate from UTSC’s Life science program. He double majored in human biology and neuroscience and minored in psychology. He is looking to pursue a career in either genetics, genomics, or epigenetics. He has put off further studies for now due to the pandemic. In the meantime, He is volunteering at the McGowan lab at UTSC and seeking more opportunities to gain valuable experience. Outside of his studies, he enjoys exercising, playing guitar, making his own music, cooking/baking, and the odd game or movie.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: