The Department of Epidemiology and Biostatistics presents its next Epi Bios Forum with Assistant Professor Sehyun Oh, and Student Speaker Rebecca Zimba.
The forum will be held online via Microsoft Teams:
Sehyun Oh will present, Improving FAIRness of omics data through metadata harmonization.
Abstract: National and institutional efforts have established comprehensive biological data repositories, amassing diverse datasets. However, cross-study analysis within these repositories remains limited due to heterogeneous metadata structures. This lack of harmonization compromises data FAIRness and hinders the application of AI/ML to high-throughput biological data – a crucial approach for handling complex, high-dimensional multi-omics datasets.
To facilitate cross-study analysis, Oh’s team initiated the OmicsMLRepo project to harmonize metadata from diverse omics data repositories. The approach involved a manual review of metadata schemas, consolidation of similar or identical information across schemas, and incorporation of ontologies. They harmonized hundreds of studies on metagenomics and cancer genomics datasets, accessible through R/Bioconductor packages curatedMetagenomicData and cBioPortalData, respectively.
Additionally, Oh and her team developed a software package to leverage ontology-incorporated curated metadata schemas to enhance findability. To enhance the scalability of metadata harmonization, they are developing an automated metadata harmonization pipeline that leverages various NLP techniques, including LLMs and Sentence Transformers. These informatics infrastructures will provide large, AI/ML-ready omics datasets, significantly improving accessibility for diverse research communities.
Rebecca Zimba will present, Implementation science in HIV Care and Treatment Research and Evaluation.
Presenter Bios:
Sehyun Oh is an Assistant Professor at CUNY SPH, with expertise in both experimental biology and bioinformatics. As a molecular biologist by training, Oh studied DNA repair and telomere maintenance mechanisms during her doctoral and postdoctoral research.
As a bench scientist, Oh started to notice the limitations of arguing the extent to which her findings in cell lines were happening in living organisms and relevant to public health, and this made her interested in the potential of large public datasets. Oh made a career transition from a bench scientist to a bioinformatics scientist in 2017. Since then, she has worked on large public omics data analysis, statistical method development for high-dimensional data, Cloud-based computing, and user-friendly software development.
Currently, Oh is developing an omics data repository designed for the easy application of Artificial Intelligence and Machine Learning tools and incorporating histopathology images into multi-modal analysis. Her overarching career goal is to facilitate interdisciplinary research by developing intuitive bioinformatics infrastructure and user-friendly tools that lower barriers across different disciplines and resources.
Rebecca Zimba is a Research Scientist at the CUNY Institute for Implementation Science in Population Health (ISPH). She supports multiple research initiatives at the ISPH with partners at Weill Cornell and the NYC Health Department.
Zimba earned her MHS in Infectious Disease Epidemiology from the Johns Hopkins Bloomberg School of Public Health. She is currently pursuing her PhD in epidemiology at the CUNY Graduate School of Public Health & Health Policy.
Please note: This is a hybrid event. For those who wish to attend in-person, please RSVP by emailing (Giuseppina.Dimaggio@sph.cuny.edu).
Those who are unable to attend in person will be able to tune in to a livestream of the lecture via Zoom. Virtual attendees who RSVP will receive the livestream link after registering.

