PhD candidate AI-driven FAIR data extraction and harmonization
Project AI-driven FAIR data extraction and harmonization
By converting clinical notes and cohort variables into standard coding systems you will help create sufficiently large datasets for automated analysis and advanced diagnostics. Imagine helping rare disease patients by mapping textual symptom descriptions to precise phenotypic codes, which then combine with genomic data to identify potential causative variants. Or envision scaling your methods to unify data from multiple large cohort studies to research healthy child development, by seamlessly integrating local data models with emerging APIs such as DataSHIELD, Beacon or FAIR Data Point to create discoverability and analysis, and build new global collaborations.
Your research will focus on leveraging state-of-the-art Large Language Models to drive this conversion process, driven by many open questions. Which model types and sizes are most effective? How should they be prompted, orchestrated, and validated for optimal accuracy? Could we deploy them locally on our own cluster, or should we tap into cloud resources? Can we enable our partner universities and hospitals to run them locally in a federation? You will experiment with existing agentic frameworks like Ollama, LangChain, and OntoGPT to discover and refine best practices.
You will develop novel methods that will have a direct real-world impact: from improving patient diagnoses and enabling large scale anonymized data reuse for research, to laying groundwork for deeper integration with electronic health records for healthcare mainstreaming. The UMCG is a world-leader in terms of integrating AI in healthcare processes and we will leverage this position in this project to achieve global impact. Join our team of forward-thinking researchers and clinicians to shape the future of AI-driven data extraction and harmonization for healthcare.
The position is part of the Genomics Coordination Centre (GCC), the ‘big data science’ research & service hub of the University Medical Centre Groningen (UMCG) and University of Groningen (rank 66 worldwide, 3rd best place to work in EU), hosted by the Department of Genetics. Our mission is to accelerate scientific discovery in health data with innovative methods and tools that expedite medical research and improve people's lives, using open source software and large computer ‘clouds’, in particular the MOLGENIS software that we lead, but also DataSHIELD, Singularity, RedCap, XNAT, OpenStack etc.
This is a full-time PhD contract for 4 years in an excellent environment for further development. First, a temporary one-year position will be offered with the option of renewal for another 3 years. Your salary will be a minimum of € 2.901,- gross per month in the first year and a maximum of € 3.677,- (scale PhD) in the final (4th) year, based on a full-time appointment. In addition, the UMCG will offer you 8% holiday pay, and 8.3% end-of-year bonus. The conditions of employment comply with the Collective Labour Agreement for Medical Centres (CAO-UMC).
Apply now and join us in revolutionizing how medical data is utilized in the future of healthcare. We look forward to hearing from you!
Please use the the digital application form at the bottom of this page - only these will be processed. You can apply until 23 March 2025. Within half an hour after sending the digital application form you will receive an email- confirmation with further information.
The UMCG has a preventive Hepatitis B policy. The UMCG can provide you with the vaccination, should it be required for your position. In case of specific professions a ‘Certificate of Good Conduct’ is required.
© BSL Media & Learning, onderdeel van Springer Nature