Dr. Raphael Gottardo is the scientific director at the Fred Hutchinson Cancer Research Center, an organisation which looks for new ways to prevent, diagnose and treat cancer, HIV/AIDS and other life-threatening diseases.
Dr. Gottardo has a PhD in statistics, focusing on the overlap between biology, statistics and computational skills. This has put him in good stead for his role at Fred Hutch, which he joined back in August 2010.
He explained that the center has a spectrum of people working at the company all the way from basic science - thinking about how a cell works - to transactional science and clinical research, where the focus is on the next cancer treatment and how it can be tested.
All areas of the Fred Hutch business are generating huge amounts of data. Knowing this, Dr. Gottardo and other staff thought it would make sense to consolidate these silos and create an umbrella organisation that focuses purely on data science.
This led to the creation of the Translational Data Science Integrated Research Center, which is where Dr. Gottardo is now based. Dr. Gottardo said:
We’re trying to recruit the best data scientists in the world to help us leverage new technologies to generate massive datasets and new insights. We had to think about doing this properly, using our expertise across the board, and then contemplate how we can leverage the cloud because most of these data sets are very large.
The center is known for its work on immunotherapy which goes back 30 years. Bone marrow transplants work by enhancing the immune response to cancers through the immune cells from a donor – and while Fred Hutch still uses this standard of care, it wanted to come up with novel ways to fight cancer, such as reengineering someone’s own immune system to fight cancer. This would involve using a blood sample, picking out specific immune cells and using gene editing technology to make them different, so that when you reinfuse these cells they will find the cancer cells and kill them.
Dr. Gottardo explained:
This works well, we’re seeing a great response. In fact, we can already talk about a cure for certain patients and certain cancers. However, it doesn’t work for everybody. Some people will not respond or they will respond and then relapse very quickly. What we’re trying to understand is what is the key difference. We now have amazing new ways to interrogate the system, what we refer to as single cell technology.
This means if Fred Hutch takes a blood sample from a patient, it can actually measure the activity of all the genes in each of the cells in the sample. This would result in millions of cells being generated, with up to 30,000 genes in each cell, which is around a terabyte of data per sample.
Keeping in mind that this is just one sample, and that patients would be required to have a sample taken before or after therapy, as well as biopsies, that is a lot of data to process. The datasets also need to be pre-processed, they need to be aligned and they quantified. To help the organisation with these challenges, the company sought a cloud partner.
Dr. Gottardo said AWS was selected as it was a leader in cloud computing and because it supports open source software well – which was important as the center uses Nextflow to orchestrate computation.
We also have a good relationship with AWS and that’s something we’re trying to foster and work with a bit more. At the end of the day we’re scientists and we want what is going to help us the fastest, so we’re not closing any doors [to other cloud providers] but we feel like AWS could be a very strong partner.
Dr. Gottardo added:
We want to analyse all these files and datasets in the best way possible. We have a pretty good infrastructure at the Fred Hutch – it’s state of the art for an organisation like us, but because it’s in-house it doesn’t really scale. AWS enables us to scale – we can basically just scale on demand. We’ve already set up where we have data that is stored locally but we use AWS instances for doing calculations in the cloud.
Data efficiency and Machine Learning
The second area where Fred Hutch is using AWS is to share data more efficiently. This includes publicly available data sets, such as data from the Human Immunology Project Consortium. By using AWS, the organisation does not have to copy these data sets every time a researcher needs access to them. In addition, the researchers can do their calculations in the cloud.
Dr. Gottardo also wants to use AWS to improve the way the organisation shares data within the center. Fred Hutch is working on a streamlined process which would enable faster insight, and also allow different teams to work on the same dataset at the same time.
The final area of AWS that Dr. Gottardo believes it can benefit from is machine learning.
He suggests that with the immunotherapy issue he highlighted earlier, where some patients respond well to treatment and others do not, there is an opportunity for the organisation to use all the data before treatment in an attempt to detect a specific ‘signature’ that would make it clear what kind of immune system is more likely to respond. Dr Gottardo explained:
We’re using machine learning to basically analyse these data sets and come up with new signatures – some of these things are to do with necessity because we need to understand the data, and others are more exploratory where we don’t really know what tools we would need to use, so we would then implement and develop new tools afterwards.
Up until now, the organisation had opted to use open source software rather than AWS for machine learning, partly because it has more control over it and partly because it is free.
We often don’t want to spend money on one thing without necessarily understanding if it’s of value to us, but as we move more into doing research and understanding what we need to do, there’s going to be specific problems that we can maybe move into production mode that we’d need to scale very rapidly.
For that, we’ve been starting to explore tools such as AWS Sagemaker and seeing how we can leverage them and scale the computation and enable us to compare the models, which enable us to research faster and help patients in less time.
The organisation is starting to use Sagemaker across the board, in some cases just to explore what it can do.