Tõnu Esko (host)
Head of the Estonian Biobank Innovation Center, BC Platforms Scientific Advisory Board Chairman, and Vice Director, Institute of Genomics, University of Tartu
Ellen Sukharevsky 0:30
Hello and welcome to the BC Platforms Podcast. My name is Ellen and I will be your moderator today. BC Platforms is a global leader in providing a powerful data discovery and analytics platform, as well as data science solutions for personalized health care. BC Platforms enables cross functional collaboration with our global federated network of data partners.
This podcast will focus on the topic of shaping a data access framework for population health. This podcast sheds light on industry challenges and expectations, as well as innovative approaches to collaboration. This discussion is held by Tõnu Esko, BC Platforms SAB Chairman and Vice Director, Institute of Genomics, at the University of Tartu. He also holds a Professor of Human Genomics Position. He is the Head of the Estonian Biobank Innovation Center, and focuses on public-private partnerships and innovation transfer. Dr. Esko is also a research scientist at the Broad Institute of Harvard and MIT. He acts as one of the senior leaders of the Estonian Personalized Medicine Program, and served as a scientific advisor for several companies. Our speaker today is Gerry Reilly. Gerry graduated from Queen Mary University of London in Physics, joining Sony Broadcast to undertake research in digital video and imaging. After Sony he joined Stuart Hughes leading software development for helicopter health diagnostics, before moving to digital equipment, where he held a variety of technical and management software development roles. He returned to technical leadership joining IBM as a strategist in the middleware group. His final role at IBM was as Vice President for event services in IBM Cloud and Watson group. In 2018, Jerry joined Health Data Research UK as chief technology officer to lead HDR UK's technical strategy, and it's now Technologists in Residence focusing on the development of HDR UK's international work and long term technology strategy. Gerry is a fellow British Computer Society, Chartered Engineer and Chartered IT professional. He's an academic assessor for the British Computer Society, and a member of the industrial advisory board for the School of Electronic, Engineering and Computer Science at QMUL. Now, I will hand over to the speakers for a brief intro to begin the discussion.
Tõnu Esko 2:33
So hello, my name is Tõnu Esko. I'm the host today, and I'm really glad to introduce our speaker Gerry. Today, we will have a very interesting discussion on the topic to ponder on, it's shaping a data access framework for population health. As we all know, big data, especially in the health space, is of great value, but I would say the most burning problem is that there is a lot of data, but we cannot really get it into the hands of society. So Gerry, let's kick off the discussion. What do you think about the obstacles for data sharing, and also, I would like to highlight that it's very rarely you see someone working with health data, who has so deep understanding in technology, as you have.
Gerry Reilly 3:26
So thank you Tõnu. The government in the UK launched its life sciences strategy in 2017. One of the major pain points that was seen, was the whole area of access to health data. It was seen that there was a fabulous set of assets in the UK around health data. But what we were hearing from academia, from industry, and even from the NHS, was access to those assets was very difficult. So HDR UK was set up in 2018. and I joined right at the beginning of the journey to try and bring about a more harmonized approach to research into health data in the UK, focusing in on three areas, what we term uniting, so bringing the assets together, improving access management to those assets, and improving collaboration around those assets; improving by which we mean working with the data custodians to improve the quality of those assets, the common approach to those assets and quality of metadata representing those assets, and also using to actually bring about a multi institutional approach to health data research in the UK around some particular critical areas. So over the last three years, those have been our focus areas. We started, as Tõnu has touched on, in looking at what the major pain points were. And as we talk to people around the UK, and that's researchers, it's the data custodians themselves, it was very clear that actually one thing came through. That was the lack of harmonization around Access Request management, and just the sheer pace. When you might wait months, indeed, we came across cases where people have waited years in the process of applying for access to datasets for research. And if you're in a research project, whether you're a PhD student, or whether you're an established researcher, you can't wait. That may be the length of your grant, it might be the length of your PhD, to be sitting there waiting. So whilst having great assets on the ground is one thing, this is really the hottest single topic, when it comes to bringing those assets together in a way that makes real difference to patients and the public.
Tõnu Esko 5:55
And those are very, very important highlights, because in the end, it's in datasets in general, but has to actually in the health data, the data is in siloed databases, and it's very often the data custodians themselves don't know what data they have and to say that everyone else. The point that you brought out, the time to get access to the data is really the key. And very exactly, I think when you are a researcher or doing a PhD, then you're gonna wait for years to get access, and another point is harmonization. So in Estonia, how we, or at least for the Estonian Biobank we have solved it having this nationwide IT infrastructure that enables us or enables the databases to talk to each other almost automatically, and the data is retrieved and exchanged quite easily making all the services nationwide very smooth, but also enabling research. But how in the case of the UK, where you have more than 60 million people and several 10s of healthcare systems, how to make all those systems work together or exchange information, or even understand what kind of data sources, data types are there available. So how has health data research enabled the ecosystem in the UK?
Gerry Reilly 7:25
Absolutely. So in the UK, we have a couple of challenges of scale, a couple of challenges and a couple of opportunities as well. We've a population of approximately 66 million, but we're actually four different nations with four different governance approaches to health data. Health is actually a devolved responsibility in the UK. So England, Scotland, Wales and Northern Ireland have their own regulatory responsibilities around health care. Whilst we have a single peer construction in the National Health Service, the system is still quite siloed and it's resource constrained. One of the first approaches, there's probably two things early that have made a difference here. If we brought together a group called the UK Health Data Research Alliance, and this represents the health systems in the four nations, it represents Genomics England, it represents UK Biobank, it represents some of the research intensive hospitals, some of the charities, and today is about 42 different data custodians. That's a grouping of people who are there because they are data custodians. They hold data that is of potential interest to the research community. By bringing those together and convening those as a group, we've managed to get some discussions going around how we should approach Access Request management? Can we harmonize it? Can we come up with common approaches and collaborate around it? Can we collaborate around a trusted research environment, so we have a better approach and still a very secure and ethical approach to research on health data? So bringing together those key data custodians in a group that isn't just a talking shop, it is a group who are really looking to learn best practices being really important. And that group has developed and grown over the last two years, while becoming much more focused on specific topics such as this Access Request process. The other thing that's been really important is you only get an Access Request issue once you understand what data sets are there for you to research in the first place. One of the things we identified was, it was actually quite difficult to identify what data sets were available across the system, the quality of those data sets, good representation, and consistent representation of the metadata around those datasets and the usage of those data sets. Because if you're an industrial user, if you're an SME developing an AI application, you don't want to waste your time attempting to get access to a data set that's never going to be made available for commercial use. You want to have some context around the available uses for those datasets. We've invested heavily over the last 12 months in building what we call the Health Data Research Innovation Gateway, where we've initially focused on three things. Initially that has been around having metadata represented through a metadata catalog, so we can provide a a search capability over initially basic administrative metadata, but now metadata going down to the variable level, providing a infrastructure for the harmonized Access Request management, which we've been rolling out the last couple of months, and I'll talk more as we get deeper into conversation around that; and also a collaborative platform where researchers can get together, pose questions, talk about their project, so building community around research. We launched it back in January of last year, just before the COVID world hit us, and that's very pertinent to this discussion about timeliness. Today, we have over 600 data sets represented in the Innovation Gateway and closing in on 1000 registered users, who are typically researchers right from across the NHS, academia and industry. Building a discovery experience and Access Request management experience that is more consistent, there's a way to go, but it's really being by building out that discovery and building out that community around the data custodians, that we've started to break down the siloed nature. And also making sure this is very much a four nations approach because as I said, the UK isn't one nation when it comes to health data, we are four distinct nations with distinct approaches.
Tõnu Esko 12:05
I think the fact that you have been able to build this platform and bring together more than 600 different sources and representation of many data custodians, that are having a thought leadership around this data sharing aspects, it would be quite interesting to know what have been those cultural shifts in thinking from those data custodians' side, because from my own experience, I know that people tend to be rather restrictive and possessive around data, especially the health data. So how have been the key learnings or tricks how you have managed to bring together all those data custodians and open up the datasets for research and for discoveries?
Gerry Reilly 12:55
So the first thing is what we didn't actually do, as much as what we did do, is we went into this saying, we're a convening group, we aren't the data custodian ourselves, we want to work with you to make this float more effectively. Ultimately, the access mechanisms, the review cycle, the approval cycle, still sits at the data custodians. So we that was actually crucial to the cultural question is, we weren't going in there saying, we're going to circumvent what you're already doing, what we want to do is work with you to make your approach continue to work effectively. And we also want to do that with full transparency. One of the things that we feel is really important is the involvement of patients and the public within this. Whenever the Alliance meets, it meets with representatives of our Patient and Public Advisory Group in the room, in the discussions central to everything that's going on. The majority of the data custodians in the UK are actually enthusiastic about wanting to make data available for research. Indeed, many of the charities that we've got involved have really highly motivated communities around them, because they're looking at disease areas such as brain tumor and cystic fibrosis and areas like that, where they really want the health research to happen. I don't think there's a reluctance, but there's actually a quite correct wish to make sure this happens in a good way. There's been bad history in the UK, as I suspected there has been in many other countries, of this being mishandled. There is naturally a slightly protective tendency, and this is sensitive data, it's people's data. So it's important to have the patients and public in there providing that balance, but also to respect the history and respect Access Request management. So instead of trying to change that, what we focused on is getting people together and saying, right, can we make this more effective? Can we do things like have standard questions, standard forms, measurable processes so that it's a bit like your pizza tracker. If you've ordered your pizza, where is it in the delivery? If you made your access request process, where is it in the process? Where is it in the review, how's it going on? But also, I mentioned earlier, this collaboration, one of the things we've enabled through the Gateway has been very early conversations, very early messaging between the researcher and the custodians. So that before an Access Request even goes in, they're exchanging questions, they're refining the research proposal, so that hopefully, when the Access Request goes in, it doesn't bounce back and forth like a tennis ball, but it goes through more smoothly, because it's a better refined, better worked Access Request. So what we're trying to do is get earlier conversation going, and more harmonized approach, and the ability to measure what goes on, because this will be the first time that we'll actually have the metrics, we have anecdote around the fact that it can take two years to get access to data, but this time, we have the metrics around what's happening, and we can see where the improvements can be made. One final thing is that COVID has been a game changer. As we started to do this, we've had the Alliance running now for over two years. But a Gateway, we went into MVP in the January of last year, we started the main buildout in April, just after COVID started to hit us. It was very apparent very quickly that the old approaches to Access Request management needed to change, we needed to all come behind an approach where we could get secure, safe, ethical access to data, but very, very quickly, indeed, if we were going to respond to this pandemic. Because I'm sure it's true in your own countries, this is the first pandemic of the digital age, and data is really, really important. It was actually important in the previous times of pandemics, it's in everyone's interest. Certainly, if you look back at what occurred in the 17th century in England, there was a massive interest in data, even during the last round of the Black Death. We now have the ability to however, to do something more useful with it, and we can't do that if the cycles take months. Frankly, we can't do that if the cycles take months, or let alone years, so we had to get that down to days. That has been a driving force. I think one of the things we got to really think about culturally is, when we move out of this stage of the pandemic, we learn the lessons from what we've done really well over the last 12 months, and can use those to make sure that we're in a better place moving forward.
Tõnu Esko 17:56
Yeah, I think the COVID pandemic really has demonstrated the need for these good systems of content management and access, and I think it also provide it to the most of the data custodians, the need to make the data available, but also, I personally think for the patients and the citizens, highlight the need why their data, health data needs to be available for research purposes, because our lives and our family lives literally depend on it. There are so many things that we can learn from the data, making daily decisions, actions. If you have managed in the UK to make this data access processes take not months or years, but rather days, I think this is really, really, really big achievement, because, again, historically it has been very, very tricky. Just another point I wanted to touch base is this data harmonization. You mentioned having made the data available from different data resources, but how to make all those different sources and dimensions work for different countries, and how to make this harmonization process more straightforward, in order to even have those discussions between the researchers and data custodians to plan the future research.
Gerry Reilly 19:28
To some extent, we've not focused quite so heavily on data harmonization, and I'll come back to that one a little bit in a moment. The areas we focused in on initially and of course, we are only a couple of years into the HDR UK journey, has been around standardization of metadata, because if you're going to provide a great discovery experience, you're going to do that over different data modalities in different locations coming from some routine collected data, some data from clinical trials, some data from research studies. You've got to have a common way of representing that through metadata, both of the administrative level, what is the data descriptions, abstracts, etc, but right down to variable level information. We've worked with the data custodians to come up with a standardized approach to metadata based on Dublin Core, but extending what's already in Dublin Core. That's being used across all of the data custodians who had the 603 data sets that we have today represented in the Gateway. So that's been widely adopted, and we'll continue to work through that. We've done that, in a very informal approach. We've not tried to do formal standardization, but we will, at some point, take this through a standardization body. `we've tried to do that in a much more agile way. The other areas we've been really focused on is understanding data quality, and things like completeness and quality of coding, because people will clearly understand, particularly data where this is secondary use of data. You're using data that's primarily collected within a healthcare setting, it's not being collected to be research ready, it's being collected, because there's a patient sitting on a trolley that you need some data about. Therefore, it sometimes lacks some things that you might really like to be in there, and that's just inevitable. So we've been doing a lot of work to try and understand data quality, and obviously, data quality is nuanced, because what you might want from a data set might be different to what I want from a data set if I'm doing research on it; quite frankly, if I'm doing research on a data set, I'm really only interested in the variables and the records within that data set that matches the research I want to do. The rest of the quality can be rubbish, as far as I'm concerned. So it is somewhat nuanced. But that has been an important focus. For quite a lot of the data sets, we won't go down the harmonized representation route, because this data is collected, it's sitting with the data custodians, and it may be data that's derived from clinical practice. For some of it, we will do some harmonization because, for example, we are now working with BC Platforms to do cohort discovery across the datasets in the UK. That does mean we need to do some level of harmonization, or at the very least, some level of mapping to a common data model to allow us to be able to run cohort discovery queries over data sets, that actually at the physical level may be represented differently, but we've got to map to a common logical model to allow us to be able to do our query. We've only just started on that piece of work, but that is a piece of work that's going to be important to us moving forward. So as I said, the bigger focus for us has been harmonization around metadata and getting a better understanding of the data quality that exists within those data sets.
Tõnu Esko 23:20
Definitely, the quality of the data or informativeness of the data is of crucial importance, but the saying is like rubbish in rubbish out. So the data quality is definitely very, very important. And you mentioned this common data model, I think that's where in my mind, the industry needs to move forwards, because we need some kind of widely agreed data models, where to map our data set, and then the quality and completeness is much easier to assess. If we take maybe a step further or try to give some recommendations, how should this field move forward? The data sharing, and what big steps do you foresee for the Health Data Research UK, and how UK data could be made available globally, not just for UK researchers, but also for Estonians or people from Asia? So what are your ideas and proposals on that?
Gerry Reilly 24:29
Actually, a very timely question. So we've actually started some work on the international side of this over the last few months, and in two different ways. Even with the structure we have today around the Gateway, that is not restricted to just UK access. The assets represented in the Gateway are, indeed not entirely, are primarily UK assets, but actually there's even a few international assets there already. Use of the gateway for discovery registered access to the Gateway, and even access through to those assets, is available to bonafide researchers from the UK and beyond. COVID-19 has also given us an interesting other angle to this. A few months ago, with support from Gates Foundation, Therapeutics Accelerator, and a few other funders, we were involved in convening the international COVID-19 Data Alliance, ICODA. ICODA is probably our first step into the international space. This is a broad international collaboration, HDR UK is working as effectively the convener, the program director for this, but we're bringing together data assets from a large number of nations, and researchers from a much larger number still, so we are starting to move into this space. And to the UK this is important, we're a very multicultural society. It's important for others to have the benefit of what we're doing. But actually for our own population, it's also important that we have an international element to this, because if we've got a Bangladeshi community in East London, we often want to be able to do comparative research to how that population might look from a similar ethnic group in Bangladesh, because you do see different comparisons, which are very insightful. As a multicultural society, we will benefit from the international links. We are in the early stage of building out an international strategy. But for us, that's got to be collaborative, it's got to be enabling the lower middle income countries to fully benefit from research in the global north, but it's also got to be led from an international perspective. Otherwise, quite frankly, it's too easy for the UK, the US and others to come across in a sort of patronizing way, which is entirely unhelpful. We have an enormous amount to contribute, but it should be equals contributing to international research.
Tõnu Esko 27:24
And international research is the key for future innovation. International collaboration, especially between continents or outside the EU in the light of GDPR and its latest interpretation has become challenging. We as a community had to come together and find solutions how to actually make this international research possible, how to label, to use the data resources, but also come up with computational models that could be applicable in the eyes of the law, into, for example, cloud environments. So probably the Federated data models, and other Advanced Computing approaches would really benefit society. Thank you, Gerry, for this very interesting and insightful discussion, I think we covered many important topics. I must admit, I'm very envious at what you have achieved from the UK end in connecting all the different resources, and UK in many ways is the place to look up to to build your solution. So with that, I give it over to you to conclude this podcast.
Gerry Reilly 28:44
I'd like to say I'm also envious of the Estonian digital maturity. There's something we're doing well in the UK, but there's a huge distance still to go, and in one of the countries, I think, where you really are one of the role models for digital maturity, that I think we've all got to be looking forward to in the future, as we build out this international collaboration. As you say, as we look at a federated model for secure and ethical research, this is about community, it's about international community. What I'd like to finish by saying is, I'd like to see the international community engaged here. Come and talk to us in the UK, the UK Health Data Research Alliance is there, that will expand eventually. The Health Data Gateway, as I said, is open to researchers to access from across the world. So www.healthdatagateway.org . Come along, see what we've got, see what we're developing, we're only a year into a much longer journey, and see how we can bring those assets to improve health research that benefits our patients and publics globally. Important times, we're coming out hopefully out of the Covid-19 pandemic, and it's an opportunity to learn the lessons and the improvement we've been able to make over the last 12 months, to bring to bear a whole range of other health and community issues. So with that I'll just hand back to Ellen.
Ellen Sukharevsky 30:11
Thank you, Tõnu and Gerry and to everyone for listening. Speakers, do you have any final comments?
Tõnu Esko 30:16
I think it's important to highlight again and again the importance of data access to be fair, and data custodians to get their data harmonized, and this way opens up for international collaborations and international insights into health.
Gerry Reilly 30:35
Fair, but also transparent. If there's one thing we have learned over the last three years in the UK is the data we're working on is data from people. Therefore you have to have patients and the public in this, not as something on the side but centrally all the way through. Because we've borrowed their data to do the research, and we mustn't forget it's their data, we mustn't forget that we're doing health data research to benefit our patients and public. If we do that, and we treat people's data with respect we treat people with respect, we will continue to have a rich set of data available for ongoing research which will be transformative.
Ellen Sukharevsky 31:16
Great, well thank you to everyone for speaking and for tuning in to our podcast. To connect with BC Platforms and learn more, please email firstname.lastname@example.org or visit our website www.bcplatforms.com. Thank you and we hope to stay connected with you.
Unknown Speaker 31:31
Thank you for tuning into our podcast. To connect with our company and learn more, visit our website bc platforms.com. And follow us on LinkedIn for more engaging content. Thank you and we hope to stay connected with you.