Since starting a new position with the TISLab in early 2020, my primary effort has been as Training Coordinator with the National COVID Cohort Collaborative, or N3C. This truly amazing group of scientists, clinicians, and engineers has accomplished something unique in US healthcare history: sourcing electronic health records from hospital systems and medical centers nationwide (56 and counting) into a single unified database of clinical records related to the COVID-19 pandemic.
These 7 billion rows of data representing 6.5 million people (1/3 COVID-positive, 2/3 COVID-negative) are accessible only via a FedRAMP-certified analysis ‘enclave’ with rigorous application critia. As of this writing 219 research institutions have signed data use agreements for all their employees and students, over 1,500 researchers from around the US and beyond are collaborating on over 200 research projects, and the publications committee is now tracking multiple N3C-supported journal submissions per week.
MIT Technology Review wrote a nice overview, and more cool stats can be found at the dashboard.
As training coordinator, I author a variety of training resources around the OMOP data model and the Enclave (Palantir Foundry, built on Apache Spark with many features for complex reproducible analyses and operations managment) as well as organize and make discoverable training resources developed by the N3C community and beyond, via a custom-developed Training Portal:
I additionally organize and run the N3C Enclave Users' Group (EUG), a weekly forum for N3C participants to share and learn about N3C data, methods, and best practices. Lastly, we’ve published an open-access edited volume for all things N3C, with contributions from dozens of authors from around the US: The Researcher’s Guide to N3C: A National Resource for Analyzing Real-World Health Data.
N3C is remarkably collaborative, even as collaboratives go, with many research projects organized under one of 31 umbrella Domain Teams with themes ranging from the study of COVID-19 on immunosuppressed individuals, to pediatrics, long-COVID, and machine learning.